Commit Graph

4200 Commits

Author SHA1 Message Date
Kanako Nakai 23f85fe6d4 Fix max number of threads bootstrap docs
Previously the bootstrap check for max number of threads was increased
from 2048 to 4096 yet the docs were never adjusted for this change. This
commit addresses this so the docs are in-line with the limit enforced in
the bootstrap check.

Relates #27511
2017-11-28 22:19:04 -05:00
Martijn van Groningen cb1204774b
Include the _index, _type and _id to nested search hits in the top_hits and inner_hits response.
Also include _type and _id for parent/child hits inside inner hits.

In the case of top_hits aggregation the nested search hits are
directly returned and are not grouped by a root or parent document, so
it is important to include the _id and _index attributes in order to know
to what documents these nested search hits belong to.

Closes #27053
2017-11-28 14:05:29 +01:00
David Turner a165d1df40
Minor improvements to docs for numeric types (#27553)
* Caps
* Fix awkward wording that took multiple passes to parse
* Floating point _number_
* Something more descriptive about the `scaled_float` scaling factor.
2017-11-28 11:36:07 +00:00
Jason Tedor d8c28044da
Forbid granting the all permission in production
Running with the all permission java.security.AllPermission granted is
equivalent to disabling the security manager. This commit adds a
bootstrap check that forbids running with this permission granted.

Relates #27548
2017-11-27 16:05:27 -05:00
Simon Willnauer f23ed6188d
Skip shard refreshes if shard is `search idle` (#27500)
Today we refresh automatically in the background by default very second.
This default behavior has a significant impact on indexing performance
if the refreshes are not needed.
This change introduces a notion of a shard being `search idle` which a
shard transitions to after (default) `30s` without any access to an
external searcher. Once a shard is search idle all scheduled refreshes
will be skipped unless there are any refresh listeners registered.
If a search happens on a `serach idle` shard the search request _park_
on a refresh listener and will be executed once the next scheduled refresh
occurs. This will also turn the shard into the `non-idle` state immediately.

This behavior is only applied if there is no explicit refresh interval set.
2017-11-27 18:16:10 +01:00
lcawley af971b3081 [DOCS] Fixed broken link in breaking changes 2017-11-24 09:16:14 -08:00
Christoph Büscher 5661b1c3df Merge branch 'master' into rankeval 2017-11-24 16:25:05 +01:00
kel 4885acb048 Replace `delimited_payload_filter` by `delimited_payload` (#26625)
The `delimited_payload_filter` is renamed to `delimited_payload`, the old name is 
deprecated and should be replaced by `delimited_payload`.

Closes #21978
2017-11-24 13:03:19 +01:00
Nhat Nguyen 46b508d6c9
Add wait_for_no_initializing_shards to cluster health API (#27489)
This adds a new option to the cluster health request allowing to wait
until there is no initializing shards.

Closes #25623
2017-11-23 15:09:58 -05:00
Clinton Gormley d1b1d711df Update composite-aggregation.asciidoc
Fixed asciidoc typo
2017-11-23 15:05:14 +01:00
olcbean fd564b10db Deprecate `levenstein` in favor of `levenshtein` (#27409)
Support both spellings thoughout 6.x, reporting the incorrect one as deprecated.
2017-11-23 12:53:47 +00:00
Christoph Büscher 5735477283 Fix some documentation typos 2017-11-23 12:31:25 +01:00
Simon Willnauer fadbe0de08
Automatically prepare indices for splitting (#27451)
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.

This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.

NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.

This is a 7.0 only change.
2017-11-23 09:48:54 +01:00
Mayya Sharipova 57e4d10007
Limit the number of nested documents (#27405)
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes #26962
2017-11-22 10:16:28 -05:00
Takumasa Ochi eed8d1aee5 [DOC] Fix mathematical representation on interval (range) (#27450) 2017-11-21 17:06:26 +00:00
Christoph Büscher d979ccace9 Merge branch 'master' into rankeval 2017-11-21 14:11:02 +01:00
Christoph Büscher 3348d2317f Reworking javadocs, minor changes in some implementation classes 2017-11-21 14:09:04 +01:00
Christoph Büscher 5c65a59369 Extending rank_eval asciidocs 2017-11-21 14:08:42 +01:00
Christoph Büscher d9e67a2c95 Extending `_rank_eval` documentation 2017-11-21 14:08:28 +01:00
Luca Cavanna 29450de7b5
Cross Cluster Search: make remote clusters optional (#27182)
Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that.

This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:

"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}
Such section won't be part of the response if no clusters have been skipped.

The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 11:41:47 +01:00
Ulrich Reffle dd0bb580b0 [Docs] Fix broken bulleted lists (#27470) 2017-11-21 11:10:35 +01:00
Jim Ferenczi d1093bd2fa #26800: Fix docs rendering 2017-11-20 08:41:02 +01:00
Michael Basnight 2949c53174
Remove config prompting for secrets and text (#27216)
This commit removes the ability to use ${prompt.secret} and
${prompt.text} as valid config settings. Secure settings has obsoleted
the need for this, and it cleans up some of the code in Bootstrap.
2017-11-19 22:33:17 -06:00
K. Daniel Newton 365dda8748 Correct usage of "an" to "a" in getting started docs
This commit corrects a word usage error in the getting started
docs. Since pronunciation is what determines when to use either "a" or
"an" and the word "ubiquitous" is pronounced /yo͞oˈbikwədəs/, it should
be preceded by "a."

Relates #27420
2017-11-18 07:36:43 -05:00
Mayya Sharipova 858b2c7cb8
Standardize underscore requirements in parameters (#27414)
Stardardize underscore requirements in parameters across different type of
requests:
_index, _type, _source, _id keep their underscores
params like version and retry_on_conflict will be without underscores
Throw an error if older versions of parameters are used

BulkRequest, MultiGetRequest, TermVectorcRequest, MoreLikeThisQuery
were changed

Closes #26886
2017-11-17 15:31:52 -05:00
Simon Willnauer a5df2ef538 peanut butter hamburgers 2017-11-17 20:51:39 +01:00
Jim Ferenczi 53462f6499
Make fields optional in multi_match query and rely on index.query.default_field by default (#27380)
* Make fields optional in multi_match query and rely on index.query.default_field by default

This commit adds the ability to send `multi_match` query without providing any `fields`.
When no fields are provided the `multi_match` query will use the fields defined in the index setting `index.query.default_field`
(which in turns defaults to `*`).
The same behavior is already implemented in `query_string` and `simple_query_string` so this change just applies
the heuristic to `multi_match` queries.
Relying on `index.query.default_field` rather than `*` is safer for big mappings that break the 1024 field expansion limit added in 7.0 for all
text queries. For these kind of mappings the admin can change the `index.query.default_field` in order to make sure that exploratory queries using
`multi_match`, `query_string` or `simple_query_string` do not throw an exception.
2017-11-17 10:25:21 +01:00
Jim Ferenczi 623367d793
Add composite aggregator (#26800)
* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
  * A `terms` source, values are extracted from a field or a script.
  * A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:

````
"test_agg": {
  "terms": {
    "field": "field1"
  },
  "aggs": {
    "nested_test_agg":
      "terms": {
        "field": "field2"
      }
  }
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:

````
"composite_agg": {
  "composite": {
    "sources": [
      {
	"field1": {
          "terms": {
              "field": "field1"
            }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
        }
      },
    }
  }
````

The response of the aggregation looks like this:

````
"aggregations": {
  "composite_agg": {
    "buckets": [
      {
        "key": {
          "field1": "alabama",
          "field2": "almanach"
        },
        "doc_count": 100
      },
      {
        "key": {
          "field1": "alabama",
          "field2": "calendar"
        },
        "doc_count": 1
      },
      {
        "key": {
          "field1": "arizona",
          "field2": "calendar"
        },
        "doc_count": 1
      }
    ]
  }
}
````

By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:

````
"composite_agg": {
  "composite": {
    "after": {"field1": "alabama", "field2": "calendar"},
    "size": 100,
    "sources": [
      {
	"field1": {
          "terms": {
            "field": "field1"
          }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
	}
      }
    }
  }
````

This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:

````
"settings": {
  "index.sort.field": ["field1", "field2"]
}
````

In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
2017-11-16 15:13:36 +01:00
Jim Ferenczi bf72858ce8
[Docs] Restore section about multi-level parent/child relation in parent-join (#27392)
This section was removed to hide this ability to new users.
This change restores the section and adds a warning regarding the expected performance.

Closes #27336
2017-11-16 11:29:16 +01:00
Russ Cam c42899b27e
Docs/windows installer (#27369)
* Add additional command line parameters along with important note for INSTALLDIR when upgrading
* Update windows installer images
2017-11-15 21:35:54 +11:00
Alexander Reelsen 66b5a43d0e
Logging: Unify log rotation for index/search slow log (#27298)
The existing log rotation configuration allowed the index
and search slow log to grow unbounded. This commit removes the
date based rotation and adds the same size based rotation, that
the depreciation log already has.
2017-11-15 10:01:32 +01:00
Tal Levy 5c34533761
add json-processor support for non-map json types (#27335)
The Json Processor originally only supported parsing field values into Maps even
though the JSON spec specifies that strings, null-values, numbers, booleans, and arrays
are also valid JSON types. This commit enables parsing these values now.

response to #25972.
2017-11-13 10:28:19 -08:00
Alexander Reelsen 08037eebff
Tests: Improve size regex in documentation test (#26879)
The regex has been changed to not only be able to deal with something
like `260b`, but also support `1.1kb`.
2017-11-13 10:21:53 +01:00
lcawley 3ed558d718 [DOCS] Fixed link to docker content 2017-11-10 12:10:28 -08:00
Lisa Cawley 9f43d7329b
[DOCS] Move X-Pack-specific Docker content (#27333) 2017-11-10 09:38:32 -08:00
Jim Ferenczi 29331f1127
Fail queries with scroll that explicitely set request_cache (#27342)
Queries that create a scroll context cannot use the cache.
They modify the search context during their execution so using the cache
can lead to duplicate result for the next scroll query.

This change fails the entire request if the request_cache option is explictely set
on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never
use the cache for these queries when the option is not explicitely used.
For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint
will be ignored (forced to false).
2017-11-10 16:02:06 +01:00
Martijn van Groningen b4048b4e7f
Use CoveringQuery to select percolate candidate matches and
extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes #26307
2017-11-10 07:44:42 +01:00
Nicholas Knize 06ff92d237 Add ignore_malformed to geo_shape fields
This commit adds ignore_malformed support to geo_shape field types to skip malformed geoJson fields.

closes #23747
2017-11-09 17:59:05 -06:00
Dimitris Athanasiou 66bef26495
Aggregations: bucket_sort pipeline aggregation (#27152)
This commit adds a parent pipeline aggregation that allows
sorting the buckets of a parent multi-bucket aggregation.

The aggregation also offers [from] and [size] parameters
in order to truncate the result as desired.

Closes #14928
2017-11-09 17:59:57 +00:00
Tal Levy d22fd4ea58
Introduce templating support to timezone/locale in DateProcessor (#27089)
Sometimes systems like Beats would want to extract the date's timezone and/or locale
from a value in a field of the document. This PR adds support for mustache templating
to extract these values.

Closes #24024.
2017-11-09 09:45:32 -08:00
Mayya Sharipova abbe853f1e
Add limits for ngram and shingle settings (#27211) (#27318)
Relates to #25887
2017-11-08 10:12:57 -05:00
Mayya Sharipova 148376c2c5
Add limits for ngram and shingle settings (#27211)
* Add limits for ngram and shingle settings (#27211)

Create index-level settings:
max_ngram_diff - maximum allowed difference between max_gram and min_gram in
NGramTokenFilter/NGramTokenizer. Default is 1.
max_shingle_diff - maximum allowed difference between max_shingle_size and
 min_shingle_size in ShingleTokenFilter.  Default is 3.

Throw an IllegalArgumentException when
trying to create NGramTokenFilter, NGramTokenizer, ShingleTokenFilter
where difference between max_size and min_size exceeds the settings value.

Closes #25887
2017-11-07 08:14:55 -05:00
Zachary Tong 6e9e07d6f8
Fix profiling naming issues (#27133)
Some code-paths use anonymous classes (such as NonCollectingAggregator
in terms agg), which messes up the display name of the profiler.  If
we encounter an anonymous class, we need to grab the super's name.

Another naming issue was that ProfileAggs were not delegating to the
wrapped agg's name for toString(), leading to ugly display.

This PR also fixes up the profile documentation.  Some of the examples were
executing against empty indices, which shows different profile results
than a populated index (and made for confusing examples).

Finally, I switched the agg display names from the fully qualified name
to the simple name, so that it's similar to how the query profiles work.

Closes #26405
2017-11-06 16:37:33 -05:00
Shubham Aggarwal 5a925cd40c Fixed references to Multi Index Syntax (#27283) 2017-11-06 19:15:36 +01:00
Boris Tyukin 8e9b30417c Update to support bulk updates by query (#27172)
Getting started doc stated that bulk updates by query are not supported but they are now
2017-11-06 17:32:20 +01:00
Boaz Leskes a8ff4960f3 add split index reference in indices.asciidoc
Relates to #26931
2017-11-06 12:55:41 +01:00
Simon Willnauer bd7efa908a Add ability to split shards (#26931)
This change adds a new `_split` API that allows to split indices into a new
index with a power of two more shards that the source index.  This API works
alongside the `_shrink` API but doesn't require any shard relocation before
indices can be split.

The split operation is conceptually an inverse `_shrink` operation since we
initialize the index with a _syntetic_ number of routing shards that are used
for the consistent hashing at index time. Compared to indices created with
earlier versions this might produce slightly different shard distributions but
has no impact on the per-index backwards compatibility.  For now, the user is
required to prepare an index to be splittable by setting the
`index.number_of_routing_shards` at index creation time.  The setting allows the
user to prepare the index to be splittable in factors of
`index.number_of_routing_shards` ie. if the index is created with
`index.number_of_routing_shards: 16` and `index.number_of_shards: 2` it can be
split into `4, 8, 16` shards. This is an intermediate step until we can make
this the default. This also allows us to safely backport this change to 6.x.

The `_split` operation is implemented internally as a DeleteByQuery on the
lucene level that is executed while the primary shards execute their initial
recovery. Subsequent merges that are triggered due to this operation will not be
executed immediately. All merges will be deferred unti the shards are started
and will then be throttled accordingly.

This change is intended for the 6.1 feature release but will not support pre-6.1
indices to be split unless these indices have been shrunk before. In that case
these indices can be split backwards into their original number of shards.
2017-11-06 11:37:55 +01:00
Pablo Musa 7b03d68f9f [Docs] Fix minor paragraph indentation error for multiple Indices params (#25535) 2017-11-06 10:20:20 +01:00
Nhat c7ce5a07f2
Add size-based condition to the index rollover API (#27160)
This is to add a max_size condition to the index rollover API. We use
a totalSizeInBytes from DocsStats to evaluate this condition.

Closes #27004
2017-11-04 19:51:48 -04:00
Loek van Gool 67e677f443
Add an example of dynamic field names (#27255) 2017-11-03 23:20:58 +01:00
David Turner fbf8c3ee83
Reinstate recommendation for ≥ 3 master-eligible nodes. (#27204)
In the docs for 1.7 ([doc][doc-1.7], [src][src-1.7]) there was a recommendation
for at least 3 master-eligible nodes "in critical clusters" but this was lost
when that page was updated in 2.0 ([doc][doc-2.0], [src][src-2.0]). I'd like to
reinstate this.

[doc-1.7]: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/modules-node.html
[src-1.7]: 2cbaccb2f2/docs/reference/modules/node.asciidoc
[doc-2.0]: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/modules-node.html#split-brain
[src-2.0]: 4799009ad7/docs/reference/modules/node.asciidoc
2017-11-03 08:48:48 +00:00
Yannick Welsch 7791e72626
Add additional explanations around discovery.zen.ping_timeout (#27231)
Makes it clearer that this setting should only be changed with extra care.
2017-11-02 16:52:10 +01:00
Martijn van Groningen d805c41b28
Added new terms_set query
This query returns documents that match with at least one ore more
of the provided terms. The number of terms that must match varies
per document and is either controlled by a minimum should match
field or computed per document in a minimum should match script.

Closes #26915
2017-11-01 10:55:18 +01:00
Toby McLaughlin b71f7d3559
Update Docker docs for 6.0.0-rc2 (#27166)
* Update Docker docs for 6.0.0-rc2

* Update the docs to match the new Docker "image flavours" of "basic",
"platinum", and "oss".

* Clarifications for Openshift and bind-mounts

* Bump docker-compose 2.x format to 2.2

* Combine Docker Toolbox instructions for setting vm.max_map_count for
  both macOS + Windows

* devicemapper is not the default storage driver any more on RHEL
2017-11-01 14:24:30 +11:00
Igor Motov d14486bce6
Docs: restore now fails if it encounters incompatible settings (#26933)
This change was introduced in 5.0.0, but the documentation wasn't updated to reflect it.

Closes #26453
2017-10-31 20:04:00 -04:00
javanna 506a2c276d [DOCS] Link remote info API in Cross Cluster Search docs page
Closes #26327
2017-10-31 15:24:46 +01:00
Shai Erera bd0261916c Fix Laplace scorer to multiply by alpha (and not add) (#27125) 2017-10-31 13:08:44 +01:00
javanna 34666844b3 [DOCS] Clarify migrate guide and search request validation
Relates to  #26811
2017-10-31 12:36:00 +01:00
kel c3e2bdf20c Raise IllegalArgumentException if query validation failed (#26811)
Closes #26799
2017-10-31 12:17:27 +01:00
Jim Ferenczi 792641a6e3 [Docs] #26541: add warning regarding the limit on the number of fields that can be queried at once in the multi_match query. 2017-10-30 18:03:56 +01:00
Dimitrios Athanasiou 3796471ac4 [Docs] Fix note in bucket_selector 2017-10-30 15:20:46 +00:00
Clarkie b1ce5cf836 [Docs] Fix indentation of examples (#27168) 2017-10-30 11:56:38 +01:00
Jim Ferenczi a4105c6b4a
[Docs] Clarify `span_not` query behavior for non-overlapping matches (#27150)
Closes #27134
2017-10-30 11:29:40 +01:00
Christoph Büscher 8e62314ce4
[Docs] Remove first person "I" from getting started (#27155)
Avoid first person style and consistently switch to an unpersonal style in the getting started docs.
2017-10-30 10:45:50 +01:00
Holger Bartnick aa03fb72b7 [Docs] Correct link target for datatype murmur3 (#27143) 2017-10-30 09:31:55 +01:00
Clinton Gormley 0499dc0873 Removed the beta tag from cross-cluster search 2017-10-27 08:51:36 +02:00
Martijn van Groningen f1e944a675
docs: describe parent/child performances 2017-10-26 11:49:13 +02:00
markwalkom 2b864156ca [Docs] Clarify mapping `index` option default (#27104) 2017-10-25 12:42:29 +02:00
David Turner 559fc5a4de Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
You need 4 bytes for characters outside the BMP, which includes many emoji and
a bunch of less-common writing characters too.
2017-10-24 09:50:47 +01:00
Martijn van Groningen 87c9b79b10
Return the _source of inner hit nested as is without wrapping it into its full path context
Due to a change happened via #26102 to make the nested source consistent
with or without source filtering, the _source of a nested inner hit was
always wrapped in the parent path. This turned out to be not ideal for
users relying on the nested source, as it would require additional parsing
on the client side. This change fixes this, the _source of nested inner hits
is now no longer wrapped by parent json objects, irregardless of whether
the _source is included as is or source filtering is used.

Internally source filtering and highlighting relies on the fact that the
_source of nested inner hits are accessible by its full field path, so
in order to now break this, the conversion of the _source into its binary
form is performed in FetchSourceSubPhase, after any potential source filtering
is performed to make sure the structure of _source of the nested inner hit
is consistent irregardless if source filtering is performed.

PR for #26944

Closes #26944
2017-10-19 12:04:56 +02:00
İsmail Arılık 71f5e2ce6b Fix a typo. (#27043)
`=== Instalation with Homebrew` should be `=== Installation with Homebrew`.
2017-10-18 09:46:53 -04:00
Divyum Rastogi 984731f36b [DOCS] better formatting of ES cluster status (#26838)
* better formatting of ES cluster status

* change phrase missing data
2017-10-18 01:40:21 -06:00
Pius 400480e3b0 action.auto_create_index can be set as a dynamic cluster setting (#27026)
Per https://github.com/elastic/elasticsearch/pull/20274, action.auto_create_index can be set as a dynamic cluster setting.
2017-10-17 20:44:18 +00:00
Anton Pozhidaev 70668dddf3 Update docs about `script` parameter (#27010)
Added a description of short script form. Also removed references to the obsolete `script.default_lang`.
2017-10-16 05:04:43 -07:00
Simon Willnauer 8dda827ff4 Don't refresh on `_flush` `_force_merge` and `_upgrade` (#27000)
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
2017-10-16 10:16:35 +02:00
Jason Tedor 8eba1fa17c Add docs on full_id parameter in cat nodes API
This commit adds a note to the docs on the full_id parameter in the cat
nodes API. This is a useful parameter but was not previously documented
anywhere.

Relates #27009
2017-10-13 13:49:25 -04:00
Jason Tedor a7895839a0 Reformat paragraph in template docs to 80 columns
This commit reformats a paragraph in the template docs to fit in 80
columns as for the rest of the doc, and as-is a standard that we loosely
adhere to.
2017-10-12 17:52:43 -04:00
Pius 1125bc635c Clarify settings and template on create index
This commit clarifies the interaction between settings specified in a
create index request, and those that would come from any templates that
apply to the create index request.

Relates #26994
2017-10-12 17:48:57 -04:00
agent5566 93a47cf860 Fix a typo in the similarity docs (#26970) 2017-10-12 09:29:25 -07:00
Alexander Kazakov 592ab043dd Change default value to true for transpositions parameter of fuzzy query (#26901) 2017-10-11 15:31:48 +02:00
Nicolas Sierra d6fc4affae Clarify systemd overrides
This commit clarifies how to apply an override to the systemd unit file
for Elasticsearch.

Relates #26950
2017-10-10 13:06:34 -04:00
vurple b3e9aa89dc Add Homebrew instructions to getting started
This commit adds instructions for installing Elasticsearch via Homebrew
to the Getting Started guide.

Relates #26847
2017-10-10 06:21:33 -04:00
Nhat bf4c3642b2 remove _primary and _replica shard preferences (#26791)
The shard preference _primary, _replica and its variants were useful
for the asynchronous replication. However, with the current impl, they
are no longer useful and should be removed.

Closes #26335
2017-10-08 11:03:06 -04:00
shaulzorea 9db21cd23f fixing typo in datehistogram-aggregation.asciidoc (#26924) 2017-10-08 15:12:43 +02:00
Deb Adair b57cb83567 [DOCS] Added info about snapshotting your data before an upgrade. 2017-10-06 12:14:26 -07:00
Adrien Grand 4e1ff8d086 Add documentation about disabling `_field_names`. (#26813)
This field has significant index-time overhead.

Closes #26779
2017-10-06 16:49:15 +02:00
Clinton Gormley eb3ead6561 Update type-field.asciidoc
Fixed asciidoc syntax on deprecated annotation
2017-10-06 11:57:27 +02:00
Steve Kotsopoulos dd95849b62 Document JVM option MaxFDLimit for macOS ()
This commit documents a JVM option that is needed on macOS when raising
file descriptor limits there.

Relates #26900
2017-10-05 14:56:15 -04:00
Md. Abdulla-Al-Sun a40c474e10
Added Bengali Analyzer to Elasticsearch with respect to the lucene update(PR#238) 2017-10-05 13:25:05 +02:00
Alexander Kazakov 9c95e91471 Expose `fuzzy_transpositions` parameter in fuzzy queries (#26870)
Add fuzzy_transpositions parameter to multi_match and query_string queries.
Add fuzzy_transpositions, fuzzy_prefix_length and fuzzy_max_expansions
parameters to simple_query_string query.
2017-10-05 09:01:09 +02:00
Jim Ferenczi 17b9baf5fd Clarify pure wilcard matching with `query_string` (#26814)
In 5.x pure wildcard queries `*` in `query_string` are rewritten to `exists` query for efficiency.
Though this introduced a change in the document that match such queries because
`exists` query also return documents with an empty value for the field.
This change clarifies this behavior for 5.x and beyond.

Closes #26801

* review
2017-10-04 09:55:26 +02:00
Shane Connelly b33c444db5 Shows how to disable CCS from dedicated master/data (#26860)
This is really just the last bit of the OSS component of https://github.com/elastic/elasticsearch/issues/25210
2017-10-03 06:15:30 -07:00
David Roberts a292740b9e Add cgroup memory usage/limit to OS stats on Linux (#26166)
This change adds cgroup memory usage/limit to the OS stats section of
the node stats on Linux.  This information is useful because in Docker
containers the standard node stats report the host memory limit, not
taking account of extra restrictions that may have been applied to the
container.

The original idea was to store these values as Long, truncating any values
outside the range of long.  However, this meant that in the relatively common
case of no limit being applied, users would not see the same value in the OS
stats as they see by querying Linux directly.  So instead the values are stored
as String.  This change places a burden on consumers of the strings to
convert the strings to numbers and decide what to do about extremely large
values, but there will be very few consumers and they would need to have a
policy for dealing with "no limit" in any case.
2017-10-03 12:08:36 +01:00
markwalkom dbea83a1d0 [Docs] Update length-tokenfilter.asciidoc (#26849)
Made it clear what the numeric value of `Integer.MAX_VALUE`  is,
2017-10-02 11:01:43 +02:00
Amine Daï 3cb99aeec1 Fix references to vm.max_map_count in Docker docs
This commit fixes some references to vm.max_map_count in the Docker
docs.

Relates #26798
2017-09-29 15:56:18 -04:00
David Turner 8fe9a20982 Forbid negative values for index.unassigned.node_left.delayed_timeout (#26828)
Change delayed_timeout to be a positiveTimeSetting, and add note that this is a breaking change
2017-09-29 14:44:43 +01:00
Jason Tedor cfd6f35fc3 Add note to docs on /etc/elasticsearch ownership
This commit adds a note to the docs for the RPM and Debian installation
regarding the expected permissions for /etc/elasticsearch.

Relates #26795
2017-09-27 09:22:52 -04:00
olcbean 6952f7b560 Validate top-level keys for create index request (#23755) (#23869)
This commit ensures create index requests do not ignore unknown keys passed to the request.

closes #23755
2017-09-26 09:49:20 -07:00
Jim Ferenczi 74473c1c3d Early termination with index sorting should not set terminated_early in the response (#26597)
Early termination with index sorting always return the best top N in the response but set the flag `terminated_early`
in the response. This can be confusing because we use the same flag for `terminate_after` which on the contrary returns partial results.
This change removes the flag when results are not partial (early termination due to index sorting) and keeps it only when `terminate_after` is used.

Closes #26408
2017-09-26 11:37:11 +02:00
Christoph Büscher 6189c54c84 Reject the `index_options` parameter for numeric fields (#26668)
Numeric fields no longer support the index_options parameter. This changes the parameter
to be rejected in numeric field types after it was deprecated in 6.0.

Closes #21475
2017-09-25 23:43:14 +02:00
Christoph Büscher 3827918417 Add configurable `maxTokenLength` parameter to whitespace tokenizer (#26749)
Other tokenizers like the standard tokenizer allow overriding the default
maximum token length of 255 using the `"max_token_length` parameter. This change
enables using this parameter also with the whitespace tokenizer. The range that
is currently allowed is from 0 to StandardTokenizer.MAX_TOKEN_LENGTH_LIMIT,
which is 1024 * 1024 = 1048576 characters.

Closes #26643
2017-09-25 17:21:19 +02:00
javanna dee2ae1023 [DOCS] Replace mention of string field type with text and keyword
Closes #25713
2017-09-25 11:12:06 +02:00
Jason Tedor d8bb413b1b Configure heap dump path out of the box
The JVM defaults to dumping the heap to the working directory of
Elasticsearch. For the RPM and Debian packages, this location is
/usr/share/elasticsearch. This directory is not writable by the
elasticsearch user, so by default heap dumps in this situation are
lost. This commit modifies the packaging for the RPM and Debian packages
to set the heap dump path to /var/lib/elasticsearch as the default
location for dumping the heap. This location is writable by the
elasticsearch user by default. We add documentation of this important
setting if /var/lib/elasticsearch is not suitable for receiving heap
dumps.

Relates #26755
2017-09-22 14:22:03 -04:00
Yannick Welsch df5c450e89 Add v6.1 BWC layer for adding wait_for_active_shards to index open command
This commit disables BWC tests while adding a v6.1 BWC layer for the PR #26682
2017-09-22 16:30:07 +02:00
Alexander Kazakov ff737a880c Add wait_for_active_shards parameter to index open command (#26682)
Adds the wait_for_active_shards parameter to the index open command. Similar to the index creation command, the index open command will now, by default, wait until the primaries have been allocated.

Closes #20937
2017-09-22 11:15:03 +02:00
lcawley 06551a8549 [DOCS] Added index-shared4 and index-shared5.asciidoc 2017-09-20 10:54:26 -07:00
Tahmim Ahmed Shibli 34662c9e6d [Docs] Fix name of character filter in example. (#26724) 2017-09-20 17:08:43 +02:00
Christoph Büscher 86b00b84bc Remove parse field deprecations in query builders (#26711)
The `fielddata` field and the use of the `_name` field in the short syntax of the range 
query have been deprecated in 5.0 and can be removed.

The same goes for the deprecated `score_mode` field in HasParentQueryBuilder,
the deprecated `like_text`, `ids` and `docs` parameter in the `more_like_this` query,
the deprecated query name in the short version of the `regexp` query, and several
deprecated alternative field names in other query builders.
2017-09-20 16:22:21 +02:00
Tanguy Leroux c16c653c3e [Test] Fix reference/cat/allocation/line_8 test failure
In this test, 260b is replaced by the regexp \d+b
but the test sometimes produces results like 1.1kb
so this commit adapts the regexp to match values
with decimals
2017-09-18 10:46:19 +02:00
Peter Dyson 1f9e0fd0dd [Docs] improved description for fs.total.available_in_bytes (#26657) 2017-09-18 16:56:19 +10:00
Dimitrios Liappis b789ce737b Docs: Use single-node discovery.type for dev example
For the single node, dev example, the `discovery.type=single-node`[1],[2] 
is a perfect fit and makes the example shorter and more self explanatory.

Also expose the transport port, to help with dev use-cases using the 
transport client.

[1] https://github.com/elastic/elasticsearch/pull/23595
[2] https://github.com/elastic/elasticsearch/pull/23598

Relates #26289
2017-09-15 16:14:47 +03:00
Christoph Büscher bea8451b2f Merge branch 'master' into feature/rank-eval 2017-09-15 11:44:51 +02:00
Tanguy Leroux 7f74a620a1 [Docs] Add description for missing fields in Reindex/Update/Delete By Query (#26618)
This commit adds some missing description for some fields
in the Reindex/UBQ/DBQ responses.
2017-09-15 11:23:57 +02:00
markwalkom 3d5f70790a [Docs] Update ingest.asciidoc (#26599)
Added a brief note to clarify where configured pipelines are stored (cluster state).
2017-09-15 11:15:31 +02:00
lcawley 120ddd99c3 [DOCS] Remove edit link from ML node 2017-09-14 16:18:29 -07:00
Michael Basnight f385e0cf26 Add bad_request to the rest-api-spec catch params (#26539)
This adds another request to the catch params. It also makes sure that
the generic request param does not allow 400 either.
2017-09-14 14:24:03 -05:00
Boaz Leskes 1ca0b5e9e4 Introduce a History UUID as a requirement for ops based recovery (#26577)
The new ops based recovery, introduce as part of  #10708, is based on the assumption that all operations below the global checkpoint known to the replica do not need to be synced with the primary. This is based on the guarantee that all ops below it are available on primary and they are equal. Under normal operations this guarantee holds. Sadly, it can be violated when a primary is restored from an old snapshot. At the point the restore primary can miss operations below the replica's global checkpoint, or even worse may have total different operations at the same spot. This PR introduces the notion of a history uuid to be able to capture the difference with the restored primary (in a follow up PR).

The History UUID is generated by a primary when it is first created and is synced to the replicas which are recovered via a file based recovery. The PR adds a requirement to ops based recovery to make sure that the history uuid of the source and the target are equal. Under normal operations, all shard copies will stay with that history uuid for the rest of the index lifetime and thus this is a noop. However, it gives us a place to guarantee we fall back to file base syncing in special events like a restore from snapshot (to be done as a follow up) and when someone calls the truncate translog command which can go wrong when combined with primary recovery (this is done in this PR).

We considered in the past to use the translog uuid for this function (i.e., sync it across copies) and thus avoid adding an extra identifier. This idea was rejected as it removes the ability to verify that a specific translog really belongs to a specific lucene index. We also feel that having a history uuid will serve us well in the future.
2017-09-14 21:25:02 +03:00
Bernd 59600dfe2d [Docs] Correct typo in removal_of_types.asciidoc (#26646) 2017-09-14 15:34:07 +02:00
Christoph Büscher c7c6443b10 [Docs] "The the" is a great band, but ... (#26644)
Removing several occurrences of this typo in the docs and javadocs, seems to be
a common mistake. Corrections turn up once in a while in PRs, better to correct
some of this in one sweep.
2017-09-14 15:08:20 +02:00
Daniel A. Ochoa 914416e9f4 [Docs] Update link in removal_of_types.asciidoc (#26614)
Fix link to [parent-child relationship].
2017-09-14 10:11:03 +02:00
Jim Ferenczi 401f4ba2ce Fix percolator highlight sub fetch phase to not highlight query twice (#26622)
* Fix percolator highlight sub fetch phase to not highlight query twice

The PercolatorHighlightSubFetchPhase does not override hitExecute and since it extends HighlightPhase the search hits
are highlighted twice (by the highlight phase and then by the percolator). This does not alter the results, the second highlighting
just overrides the first one but this slow down the request because it duplicates the work.
2017-09-14 09:31:14 +02:00
Tanguy Leroux 7404221b55 [Docs] Clarify size parameter in Completion Suggester doc (#26617) 2017-09-13 17:28:31 +02:00
Christoph Büscher 027c555c9b Add soft limit on allowed number of script fields in request (#26598)
Requesting to many script_fields in a search request can be costly
because of script execution. This change introduces a soft limit on the number
of script fields that are allowed per request. The setting can be
changed per index using the index.max_script_fields setting.

Relates to #26390
2017-09-13 17:22:16 +02:00
Jim Ferenczi c709b8d6ac Fix incomplete sentences in parent-join docs (#26623)
* Fix incomplete sentences in parent-join docs

Closes #26590
2017-09-13 16:09:00 +02:00
Christoph Büscher e00db235bc Add a soft limit for the number of requested doc-value fields (#26574)
Requesting to many docvalue_fields in a search request can potentially be costly
because it might incur a per-field per-document seek. This change introduces a
soft limit on the number of fields that can be retrieved. The setting can be
changed per index using the `index.max_docvalue_fields_search` setting.

Relates to #26390
2017-09-13 11:57:06 +02:00
Russ Cam 62a7205577 Add beta tag to MSI Windows Installer (#26616) 2017-09-13 13:23:12 +10:00
David Pilato b01b1c2a58 Remove azure deprecated settings (#26099)
Follow up for #23405.

We remove azure deprecated settings in 7.0:

* The legacy azure settings which where starting with `cloud.azure.storage.` prefix have been removed.
This includes `account`, `key`, `default` and `timeout`.
You need to use settings which are starting with `azure.client.` prefix instead.

* Global timeout setting `cloud.azure.storage.timeout` has been removed.
You must set it per azure client instead. Like `azure.client.default.timeout: 10s` for example.
2017-09-12 16:51:44 +02:00
Ryan Ernst c0c5d5488f Docs: Remove remaining references to file and native scripts (#26580)
relates #25690
2017-09-11 11:39:29 -07:00
Lee Hinman 2702918780 Limit the number of expanded fields it query_string and simple_query_string (#26541)
* Limit the number of expanded fields it query_string and simple_query_string

This limits the number of automatically expanded fields for the "all fields"
mode (`"default_field": "*"`) for the `query_string` and `simple_query_string`
queries to 1024 fields.

Resolves #25105

* Add blurb about limit to the docs
2017-09-08 13:37:55 -06:00
Martijn van Groningen b391425da1
Added support to the percolate query to percolate multiple documents
The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.

Also improved the support for multiple percolate queries in a search request.
2017-09-08 17:28:39 +02:00
Lee Hinman cff904bf97 Enable adaptive replica selection by default (#26522)
Relates to #24915
2017-09-07 09:25:05 -06:00
Jim Ferenczi d68d8c9cef Expose duplicate removal in the completion suggester (#26496)
This change exposes the duplicate removal option added in Lucene for the completion suggester
with a new option called `skip_duplicates` (defaults to false).
This commit also adapts the custom suggest collector to handle deduplication when multiple contexts match the input.

Closes #23364
2017-09-07 17:11:01 +02:00
marcocova eeded72b19 [Docs] Fix wrong indent in gateway documentation (#26501)
This changeset fixes a spurious indent that causes a code block to be generated instead of a regular paragraph.
2017-09-05 10:42:58 +02:00
Martijn van Groningen 78e9c96d7f
Added a limit to from + size in top_hits and inner hits.
Relates to #11511
2017-09-05 08:44:45 +02:00
Martijn van Groningen a4d5c6418e
percolator: Rename map_unmapped_fields_as_string setting to map_unmapped_fields_as_text
The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings
and string is no longer a field type (it is either keyword or text).
2017-09-04 14:12:44 +02:00
shaulzorea 666cf4b872
fixing typo in nested-aggregation.asciidoc (#26481) 2017-09-04 06:42:44 +02:00
Jason Tedor 279be13a00 Clarify development vs. production mode
The definition of development vs. production mode has evolved slightly
over time (with the introduction of single-node) discovery. This commit
clarifies the documentation to better account for this adjustment.

Relates #26460
2017-09-02 09:47:39 -04:00
Christoph Büscher f8fc0f3ebe [Tests] Check that quoteAnalyzer overrides analyzer in `query_string` query (#26473)
Adding a check to QueryStringQueryBuilderTests that checks the override
behaviour of `quote_analyzer`, also adding documentation explaining the use of
this parameter in `query_string` query.

Closes #25417
2017-09-02 11:53:02 +02:00
Lee Hinman 4157eead22 [DOCS] Add documentation for adaptive replica selection
This adds a blurb for adaptive replica selection since it was previously
undocumented.

Relates to #24915
2017-09-01 09:53:22 -06:00
Alexander Reelsen 80d0a32f8e ScriptService: Replace max compilation per minute setting with max compilation rate (#26399)
The current script service has a script compilation limit for a one
minute window. This is set to a small default value of 15. Instead of
increasing that default value, this commit introduces a new setting 
that allows to configure a rate per time unit, so that the script service can deal with bursts better.

The new setting is named `script.max_compilations_rate`,
requires a nonnegative number and a positive time value.

The default is `75/5m`, which is equivalent to the existing 15 per minute.
2017-09-01 10:15:27 +02:00
Matt Weber 140395c83f Multi-level Nested Sort with Filters (#26395)
Multi-level Nested Sort with Filters

Allow multiple levels of nested sorting where each level can have it's own filter.
Backward compatible with previous single-level nested sort.
2017-08-30 18:52:56 +02:00
Martijn van Groningen c821dce3fe
Revert "Multi-level Nested Sort with Filters"
This reverts commit 6377afa6c3.
2017-08-30 14:53:25 +02:00
Martijn van Groningen 6377afa6c3
Multi-level Nested Sort with Filters
Allow multple levels of nested sorting where each level
can have it's own filter.  Backward compatible with
previous single-level nested sort.
2017-08-30 14:30:20 +02:00
Tanguy Leroux 3d07bce504 [Docs] Fix tophits-aggregation.asciidoc 2017-08-30 13:06:44 +02:00
Tanguy Leroux 643eb286dc [Docs] Convert remaining code snippets in docs (#26422)
This commit converts the last remaining code snippets so that they are
now testable.
2017-08-30 12:11:10 +02:00
Tanguy Leroux db54c4dc7c [Docs] Convert more doc snippets (#26404)
This commit converts some remaining doc snippets so that they are now
testable.
2017-08-30 09:30:36 +02:00
Jim Ferenczi 86d97971a4 Remove the _all metadata field (#26356)
* Remove the _all metadata field

This change removes the `_all` metadata field. This field is deprecated in 6
and cannot be activated for indices created in 6 so it can be safely removed in
the next major version (e.g. 7).
2017-08-28 17:43:59 +02:00
Tanguy Leroux f95dec797d [Docs] Convert more doc snippets (#26359)
This commit converts some remaining doc snippets so that they are now
testable.
2017-08-28 11:23:09 +02:00
shaulzorea a827d545d8 [Docs] Fixing phrasing in has-parent-query.asciidoc (#26396) 2017-08-28 10:26:59 +02:00
Colin Goodheart-Smithe 6b23ee8040
[TEST] Fixes docs tests
587409e893 introduced a bug where an example of the format of a request which contained placeholder values was attempted to be tested. This change adds `NOTCONSOLE` to that snippet as the immediately following snippet tests a concrete example.

220212dd69 introduced a bug because the test substitution was looking for `otherhost` where the snippet contained `oldhost`. This change fixes the substitution
2017-08-24 10:45:53 +01:00
Jason Tedor 587409e893 Fix logging level docs
This commit fixes an issue with the logging level docs reported as
unconverted snippets.
2017-08-23 21:21:56 -04:00
debadair 220212dd69 WIP: Edits to upgrade docs (#26155)
* [DOCS] Updated and edited upgrade information.

* Incorporated Nik's feedback.
2017-08-23 14:07:34 -07:00
Jason Tedor bb5b771098 Add docs regarding setting logging levels
This commit clarifies the various ways of setting logging levels and in
what circumstances they are appropriate.

Relates #26344
2017-08-23 13:21:44 -04:00
Jim Ferenczi de1e4e0c15 Accept an array of field names and boosts in the index.query.default_field setting (#26320)
* Accept an array of field names and boosts in the index.query.default_field setting

This commit allows to define an array of field names and boosts for the index setting `index.query.default_field`.
The format is equivalent to the `fields` options of the full text search queries (e.g. field_name^boost).
This commit also makes this setting dynamically updatable.

Fixes #25946
2017-08-23 15:39:54 +02:00
Christoph Büscher 62a7cac3a0 Merge branch 'master' into feature/rank-eval 2017-08-23 11:19:16 +02:00
Christoph Wurm 0120448f76 Expand How to tune for disk usage (#25562) 2017-08-21 12:07:54 -07:00
Jim Ferenczi a48616272f #26173: Removed global_ordinals_hash and global_ordinals_low_cardinality exeuction hint deprecated in 6.1 2017-08-21 20:44:34 +02:00
Jim Ferenczi 977dcfe789 Deprecate global_ordinals_hash and global_ordinals_low_cardinality (#26173)
* Deprecate global_ordinals_hash and global_ordinals_low_cardinality

This change deprecates the `global_ordinals_hash` and `global_ordinals_low_cardinality` and
makes the `global_ordinals` execution hint choose internally if global ords should be remapped or use the segment ord directly.
These hints are too sensitive and expert to be exposed and we should be able to take the right decision internally based on the agg tree.
2017-08-21 19:12:27 +02:00
Christoph Büscher 5dae277bb2 Support distance units in GeoHashGrid aggregation precision (#26291)
Currently the `precision` parameter must be a precision level
in the range of [1,12]. In #5042 it was suggested also supporting
distance units like "1km" to automatically approcimate the needed
precision level. This change adds this support to the Rest API by
making use of GeoUtils#geoHashLevelsForPrecision.

Plain integer values without a unit are still treated as precision
levels like before. Distance values that are too small to be represented
by a precision level of 12 (values approx. less than 0.056m) are
rejected.

Closes #5042
2017-08-21 17:29:28 +02:00
Christoph Büscher 254c1b28e9 [Docs] Clarify behaviour of Pattern Capture Token Filter during search (#26278)
There was some confusion about the fact that tokens emitted from a Pattern
Capture Token Filter are treated as synonyms when used to analyze a search
query. This commit adds an explanation to the note in the docs to emphasize this
behaviour.

Closes #25746
2017-08-21 14:56:52 +02:00
Jim Ferenczi 4bce727165 Refactor simple_query_string to handle text part like multi_match and query_string (#26145)
This change is a continuation of #25726 that aligns field expansions for the simple_query_string with the query_string and multi_match query.
The main changes are:

 * For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping.

 * For partial field names (with * suffix), the expansion is done only on keyword, text, date, ip and number field types. Other field types are simply ignored.

 * For all fields (*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered.

The use_all_fields option is deprecated in this change and can be replaced by setting `*` in the fields parameter.
This commit also changes how text fields are analyzed. Previously the default search analyzer (or the provided analyzer) was used to analyze every text part
, ignoring the analyzer set on the field in the mapping. With this change, the field analyzer is used instead unless an analyzer has been forced in the parameter of the query.

Finally now that all full text queries can handle the special "*" expansion (`all_fields` mode), the `index.query.default_field` is now set to `*` for indices created in 6.
2017-08-21 13:12:27 +02:00
Atothendrew c30d6ebcbb [Docs] Correct json example in ingest-node.asciidoc (#26221) 2017-08-21 11:07:44 +02:00
Antonio Matarrese 93cc2d0372 Configurable distance limit with the AUTO fuzziness. (#25731)
Make the distance thresholds configurable with the AUTO fuzziness.
2017-08-21 11:00:20 +02:00
michaelbaamonde c0dbd236c3 Fix typo re: bootstrap.memory_lock in Docker docs. (#26265)
`bootstrap_memory_lock` should be `bootstrap.memory_lock`.
2017-08-18 11:55:56 -04:00
Lee Hinman f18ec511ca Disallow : in cluster and index/alias names (#26247)
We use `:` for cross-cluster search (eg `cluster:index`), therefore, we should
not allow the ambiguity when allowing cluster or index names.

Relates to #23892
2017-08-17 14:57:26 -06:00
Nik Everett 7e76b2a8c3 Docs: fold section into current chapter
In #25602 we added a new *chapter* on aggregating by day of the
week. We intended to add a new *section* but we were missing a
single `=`.
2017-08-17 11:19:02 -04:00
Nik Everett 6d2c40e546 Enforce that responses in docs are valid json (#26249)
All of the snippets in our docs marked with `// TESTRESPONSE` are
checked against the response from Elasticsearch but, due to the
way they are implemented they are actually parsed as YAML instead
of JSON. Luckilly, all valid JSON is valid YAML! Unfurtunately
that means that invalid JSON has snuck into the exmples!

This adds a step during the build to parse them as JSON and fail
the build if they don't parse.

But no! It isn't quite that simple. The displayed text of some of
these responses looks like:
```
{
    ...
    "aggregations": {
        "range": {
            "buckets": [
                {
                    "to": 1.4436576E12,
                    "to_as_string": "10-2015",
                    "doc_count": 7,
                    "key": "*-10-2015"
                },
                {
                    "from": 1.4436576E12,
                    "from_as_string": "10-2015",
                    "doc_count": 0,
                    "key": "10-2015-*"
                }
            ]
        }
    }
}
```

Note the `...` which isn't valid json but we like it anyway and want
it in the output. We use substitution rules to convert the `...`
into the response we expect. That yields a response that looks like:
```
{
    "took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,
    "aggregations": {
        "range": {
            "buckets": [
                {
                    "to": 1.4436576E12,
                    "to_as_string": "10-2015",
                    "doc_count": 7,
                    "key": "*-10-2015"
                },
                {
                    "from": 1.4436576E12,
                    "from_as_string": "10-2015",
                    "doc_count": 0,
                    "key": "10-2015-*"
                }
            ]
        }
    }
}
```

That is what the tests consume but it isn't valid JSON! Oh no! We don't
want to go update all the substitution rules because that'd be huge and,
ultimately, wouldn't buy much. So we quote the `$body.took` bits before
parsing the JSON.

Note the responses that we use for the `_cat` APIs are all converted into
regexes and there is no expectation that they are valid JSON.

Closes #26233
2017-08-17 09:02:10 -04:00
Lee Hinman cfad6688b0 Migrate migration docs from 6.0 to 7.0 (#26227)
* Migrate migration docs from 6.0 to 7.0

Since we only keep one version of migration docs and master is now on 7.0, we
should migrate these so breaking changes can be added in the right place.

* Remove release notes as well

They link to the migration guides, so they have to go.

* Add placeholder notes for 7.0 so doc build is happy
2017-08-16 13:12:44 -06:00
Jason Tedor 6d8ef3153c Fix script setting names in script security docs
The names of two settings in the script security docs are incorrect,
referring to the prefix as "scripts" instead of "script". This commit
fixes this issue.

Relates #26236
2017-08-16 09:07:46 -04:00
R Tsien c7c8a9d1a9 "result" : created -> "result" : "created" (#25446) 2017-08-15 14:53:05 -06:00
Nik Everett 5ea6f90968 Further improve docs for requests_per_second
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
2017-08-15 15:57:07 -04:00
Berg Lloyd-Haig dd4f7eee22 Docs disambiguate reindex's requests_per_second (#26185)
Reindex's docs were somewhere between unclear and
inaccurate around `requests_per_second`. This makes
them much more clear and accurate.
2017-08-15 15:57:06 -04:00
Lisa Cawley 07f67cd8b5 [DOCS] Cleanup link for ec2 discovery (#26222) 2017-08-15 11:49:58 -07:00
Zachary Tong d26becc040 Fix NPE when `values` is omitted on percentile_ranks agg (#26046)
An array of values is required because there is no default (or
reasonable way to set a default).  But validation for values
only happens if it is actually set.  If the values param is omitted
entirely than the agg builder will NPE.
2017-08-15 13:09:15 -04:00
Antonio Matarrese 93edbc0030 describe how to apply best_compression (#25706)
* describe how to apply best_compression

* update description
2017-08-15 16:44:38 +02:00
dlindeque 81c6b9e6f4 [Docs] Fix typo in api-conventions.asciidoc (#26171) 2017-08-15 14:09:10 +02:00
Alexander Reelsen 483086220f Docs: Add search response took time explanation (#26202) 2017-08-15 08:43:26 +02:00
Jason Tedor e9687622bd Rename CONF_DIR to ES_PATH_CONF
The environment variable CONF_DIR was previously inconsistently used in
our packaging to customize the location of Elasticsearch configuration
files. The importance of this environment variable has increased
starting in 6.0.0 as it's now used consistently to ensure Elasticsearch
and all secondary scripts (e.g., elasticsearch-keystore) all use the
same configuration. The name CONF_DIR is there for legacy reasons yet
it's too generic. This commit renames CONF_DIR to ES_PATH_CONF.

Relates #26197
2017-08-15 06:19:06 +09:00
Andy Bristol 7e3cd6a019 reindex: automatically choose the number of slices (#26030)
In reindex APIs, when using the `slices` parameter to choose the number of slices, adds the option to specify `slices` as "auto" which will choose a reasonable number of slices. It uses the number of shards in the source index, up to a ceiling. If there is more than one source index, it uses the smallest number of shards among them.

This gives users an easy way to use slicing in these APIs without having to make decisions about how to configure it, as it provides a good-enough configuration for them out of the box. This may become the default behavior for these APIs in the future.
2017-08-11 08:25:25 -07:00
Martijn van Groningen 636e85e5b7
percolator: Hint what clauses are important in a conjunction query based on fields
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.

For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.
2017-08-11 15:32:01 +02:00
Daniel Mitterdorfer 637cc872f4 Remove unused Netty-related settings (#26161)
With this commit we remove the following three previously unused 
(and undocumented) Netty 4 related settings:

* transport.netty.max_cumulation_buffer_capacity,
* transport.netty.max_composite_buffer_components and
* http.netty.max_cumulation_buffer_capacity 

from Elasticsearch.
2017-08-11 12:03:00 +02:00
Martijn van Groningen 076167fbe5
inner hits: Unfiltered nested source should keep its full path
like filtered nested source.

Closes #23090
2017-08-10 15:58:29 +02:00
Nik Everett 7d5f00d1d2 Docs: Note feature missing from reindex
Reindex-from-remote doesn't support slices and I hadn't documented
that.

Closes #26114
2017-08-09 09:44:52 -04:00
Jim Ferenczi a7e1610134 Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string (#26097)
* Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string

This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true).
This option can be used in conjunction with synonym_graph token filter to generate phrase queries
when multi terms synonyms are encountered.
For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed:
((ny OR "new york") AND city)

Note how the multi terms synonym "new york" produces a phrase query.
2017-08-09 12:15:09 +02:00
Ian Fisk 8cb1391f40 Docs: Use correct field name in Field Value factor docs. (#26104) 2017-08-08 16:34:20 -04:00
markwalkom 746487c3f3 Update templates.asciidoc (#26036)
Dropped in a few links to index settings and mappings to make things easier to jump to.
2017-08-08 11:29:11 +02:00
Adrien Grand f0cba4fce5 Add a scripted similarity. (#25831)
The goal of this similarity is to help users who would like to keep the
functionality of the `tf-idf` similarity that we want to remove, or to allow
for specific usec-cases (disabling idf, disabling tf, disabling length norm,
etc.) to not have to build a custom plugin and familiarize with the low-level
Lucene API.
2017-08-08 08:55:12 +02:00
Tal Levy 872526cad3 add URL-Decode Processor to Ingest (#26045)
closes #25837

Adds a URL Decoder Processor to Ingest

this will decode urls like:

https%3a%2f%2felastic.co%2 to https://elastic.co/
2017-08-07 10:26:11 -07:00
Christoph Büscher 18155ed69a Merge branch 'master' into feature/rank-eval 2017-08-07 16:07:34 +02:00
Martijn van Groningen b88cfe2008
docs: Use stackexchange based example to make documentation easier to understand 2017-08-04 16:04:26 +02:00
Zachary Tong 829f7cb658
CONSOLEify ip-range bucket agg docs
Related #18160
2017-08-03 17:19:54 -04:00
Zachary Tong e7eda5e1be
CONSOLEify scripted-metric agg docs
Related #18160
2017-08-03 17:19:54 -04:00
Aron Szanto 316cb42b21 Update shards_allocation.asciidoc (#26019)
Slight language and consistency updates in shard balancing heuristics
2017-08-03 11:27:02 +02:00
Zachary Tong d8414ffa29
CONSOLEify percentile and percentile-ranks docs
Related #18160
2017-08-02 17:47:27 -04:00
Zachary Tong 268923ebdc
CONSOLEify extended_stats docs
Related #18160
2017-08-02 16:13:30 -04:00
Jason Tedor 7066ec44ca Add recommendation on unicast hosts to docs
This commit adds a small note to the discovery docs to include a note
that we recommend that the unicast hosts list be maintained as the list
of master-eligible nodes in the cluster.

Relates #25991
2017-08-01 18:15:50 +09:00
Tanguy Leroux 9c8d3d3569 [Docs] Add migration notes for the high-level rest client (#25911) 2017-08-01 10:38:56 +02:00
Jason Tedor bc8dc683e4 Update config files docs
This commit updates the docs for the config files to explain the new
mechanism for customizing the configuration directory via the
environment variable CONF_DIR.

Relates #25990
2017-08-01 09:52:23 +09:00
Jason Tedor fd18e3239a Remove mention of http_address in nodes info docs
This commit removes an outdated reference to http_address in the nodes
info docs. This information is available in the http object for each
node in the nodes info API response.

Relates #25980
2017-07-31 22:04:16 +09:00
Jason Tedor 540413b24a Also skip JAVA_TOOL_OPTIONS on Windows
On non-Windows platforms, we ignore the environment variable
JAVA_TOOL_OPTIONS (this is an environment variable that the JVM respects
by default for picking up extra JVM options). The primary reason that we
ignore this because of the Jayatana agent on Ubuntu; a secondary reason
is that it produces an annoying "Picked up JAVA_TOOL_OPTIONS: ..."
output message. When the elasticsearch-env batch script was introduced
for Windows, ignoring this environment variable was deliberately not
carried over as the primary reason does not apply on Windows. However,
after additional thinking, it seems that we should simply be consistent
to the extent possible here (and also avoid that annoying "Picked up
JAVA_TOOL_OPTIONS: ..." on Windows too). This commit causes the Windows
version of elasticsearch-env to also ignore JAVA_TOOL_OPTIONS.

Relates #25968
2017-07-31 21:27:42 +09:00