Commit Graph

146 Commits

Author SHA1 Message Date
Clinton Gormley ff4a2519f2 Update experimental labels in the docs (#25727)
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00
Simon Willnauer e81804cfa4 Add a shard filter search phase to pre-filter shards based on query rewriting (#25658)
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.

This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
2017-07-12 22:19:20 +02:00
matarrese 2eafbaf759 Document aggregating by day of the week (#25602)
Add documentation for aggregating by day of the week.

Closes #24660
2017-07-07 14:16:53 -04:00
Clinton Gormley 0170e0e8d3 Remove usage of multi-types from the docs and added a page explaining type removal (#25543)
Closes #25401
2017-07-05 12:30:19 +02:00
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id (#25127)
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00
Tanguy Leroux 528bd25fa7 Add superset size to Significant Term REST response (#24865)
This commit adds a new bg_count field to the REST response of
SignificantTerms aggregations. Similarly to the bg_count that already
exists in significant terms buckets, this new bg_count field is set at
the aggregation level and is populated with the superset size value.
2017-06-02 09:45:15 +02:00
markharwood b7197f5e21 SignificantText aggregation - like significant_terms, but for text (#24432)
* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results.

Closes #23674
2017-05-24 13:46:43 +01:00
Ryan Ernst 463fe2f4d4 Scripting: Remove file scripts (#24627)
This commit removes file scripts, which were deprecated in 5.5.

closes #21798
2017-05-17 14:42:25 -07:00
Zachary Tong a2845c86fe
CONSOLEify some more aggregation docs
Related #18160
2017-05-16 17:25:24 -04:00
qwerty4030 e7d352b489 Compound order for histogram aggregations. (#22343)
This commit adds support for histogram and date_histogram agg compound order by refactoring and reusing terms agg order code. The major change is that the Terms.Order and Histogram.Order classes have been replaced/refactored into a new class BucketOrder. This is a breaking change for the Java Transport API. For backward compatibility with previous ES versions the (date)histogram compound order will use the first order. Also the _term and _time aggregation order keys have been deprecated; replaced by _key.

Relates to #20003: now that all these aggregations use the same order code, it should be easier to move validation to parse time (as a follow up PR).

Relates to #14771: histogram and date_histogram aggregation order will now be validated at reduce time.

Closes #23613: if a single BucketOrder that is not a tie-breaker is added with the Java Transport API, it will be converted into a CompoundOrder with a tie-breaker.
2017-05-11 18:06:26 +01:00
Adrien Grand 1be2800120 Only allow one type on 7.0 indices (#24317)
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.

Relates #15613
2017-04-27 08:43:20 +02:00
Suhas Karanth cee76295ca Update aggs reference documentation for 'keyed' options (#23758)
Add 'keyed' parameter documentation for following:
 - Date Histogram Aggregation
 - Date Range Aggregation
 - Geo Distance Aggregation
 - Histogram Aggregation
 - IP range aggregation
 - Percentiles Aggregation
 - Percentile Ranks Aggregation
2017-04-18 15:57:50 +02:00
Ulugbek Baymuradov 9cb477d387 Update filter-aggregation.asciidoc (#24138)
Fix a discrepancy between the example and the prose.
2017-04-17 18:46:13 -04:00
Nik Everett 5f91241f57 CONSOLEify geo aggregation docs
Turns the top example in each of the geo aggregation docs into a working
example that can be opened in CONSOLE. Subsequent examples can all also
be opened in console and will work after you've run the first example.
All examples are tested as part of the build.
2017-03-30 21:28:52 -04:00
Christoph Büscher 413bf05956 Docs: Add comma to reverse nested agg snippet 2017-03-17 14:07:18 +01:00
Randall Britten 98e19cced4 Docs: Corrected definition of type param of children agg (#23377) 2017-02-27 14:38:28 -05:00
Nik Everett 0c011cb290 Docs: CONSOLEify histogram aggregation docs
This adds the `COPY AS CURL` and `VIEW IN CONSOLE` links to the docs
and causes the snippets to be tested during Elasticsearch's build.

Relates to #18160
2017-02-07 16:09:32 -05:00
Jun Ohtani 7ea457955d Merge pull request #22879 from johtani/fix_documentation_error_in_date_histogram
[Doc]Not support "M" time unit in offset param
2017-02-03 16:40:08 +09:00
Nicholas Knize b41d5747f0 Reduce GeoDistance insanity
GeoDistance query, sort, and scripts make use of a crazy GeoDistance enum for handling 4 different ways of computing geo distance: SLOPPY_ARC, ARC, FACTOR, and PLANE. Only two of these are necessary: ARC, PLANE. This commit removes SLOPPY_ARC, and FACTOR and cleans up the way Geo distance is computed.
2017-02-02 12:39:42 -06:00
markharwood 9e8e556b08 Build fix for broken docs build 2017-01-31 10:27:06 +00:00
markharwood c0d525b108 [DOCS] [TEST] enhancement - added CONSOLE scripts for sampler aggs (#22869)
Added missing CONSOLE scripts to documentation for sampler and diversified_sampler aggs.
Includes new StackOverflow index setup in build.gradle

Closes #22746

* Formatting tweaks
2017-01-31 09:45:25 +00:00
Jun Ohtani 94933f9d19 [Doc]Not support "M" time unit in offset param 2017-01-31 18:23:38 +09:00
Mathieu Berube e0b8e45cc5 Fix typo - mergins to margins (#22839) 2017-01-30 13:52:32 +01:00
Nik Everett a99bddcc7e CONSOLE-ify filter aggregation docs
This adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the
snippet and causes the build to execute the snippet as a test.

Relates to #18160
2017-01-23 01:32:56 -05:00
Nik Everett 40e2645177 CONSOLE-ify date_range aggregation docs
This adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the
snippets in the docs for the `date_range` aggregation and tests
those snippets as part of the build.

Relates to #18160
2017-01-22 23:38:45 -05:00
Nik Everett f7524fbdef CONSOLE-ify date histogram docs
This adds the `VIEW IN SENSE` and `COPY AS CURL` links and has
the build automatically execute the snippets and verify that they
work.

Relates to #18160
2017-01-20 16:23:28 -05:00
Nik Everett 8c856eaa9f CONSOLE-ify global-aggregation.asciidoc
Adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the example
`global` aggregation. Also improves the example by adding a
non-`global` aggregation to compare it to.

Relates to #18160
2017-01-20 14:36:51 -05:00
markharwood f01784205f New AdjacencyMatrix aggregation
Similar to the Filters aggregation but only supports "keyed" filter buckets and automatically "ANDs" pairs of filters to produce a form of adjacency matrix.
The intersection of buckets "A" and "B" is named "A&B" (the choice of separator is configurable). Empty intersection buckets are removed from the final results.

Closes #22169
2017-01-20 15:49:31 +00:00
Jim Ferenczi 433c822d4f Promote longs to doubles when a terms agg mixes decimal and non-decimal numbers (#22449)
* Promote longs to doubles when a terms agg mixes decimal and non-decimal number

This change makes the terms aggregation work when the buckets coming from different indices are a mix of decimal numbers and non-decimal numbers. In this case non-decimal number (longs) are promoted to decimal (double) which can result in a loss of precision for big numbers.

Fixes #22232
2017-01-10 11:50:56 +01:00
Adrien Grand 787519ee4c Fix `other_bucket` on the `filters` agg to be enabled if a key is set. (#21994)
Closes #21951
2016-12-09 09:48:48 +01:00
Colin Goodheart-Smithe 8006b105f3 Update order examples to use max instead of avg (#22032)
The use of the avg aggregation for sorting the terms aggregation is not encouraged since it has unbounded error. This changes the examples to use the max aggregation which does not suffer the same issues
2016-12-07 16:00:24 +00:00
markharwood aa60e5cc07 Aggregations - support for partitioning set of terms used in aggregations so that multiple requests can be done without trying to compute everything in one request.
Closes #21487
2016-11-24 15:10:46 +00:00
Chris Fritz 546fa92d61 Fix typo in filters aggregation docs (#21690) 2016-11-21 12:52:45 +01:00
Christoph Büscher 4ccd8e79c1 Docs: Clarify date_histogram bucket sizes for DST time zones
Added a warning note that clarifies bucket sizes diverging from the intended
`interval` size when using a time zone that has DST changes.

Closes #18805
2016-11-16 09:40:07 +01:00
Sumit Gupta e53405f4f3 Update geohashgrid-aggregation.asciidoc (#21530) 2016-11-15 10:49:02 +01:00
Clinton Gormley 30d342c87c Update significantterms-aggregation.asciidoc
Fix scripted significant terms example to use `params.` prefix for painless
2016-11-14 09:40:04 +01:00
markharwood dd21aa41be Docs fix - Diversified sampler agg had incorrect title and example
Closes #21347
2016-11-07 10:46:22 +00:00
Robin Clarke bbe6555b7a Docs: your -> you're (#20883) 2016-10-12 11:09:34 -04:00
Pascal Borreli fcb01deb34 Fixed typos (#20843) 2016-10-10 14:51:47 -06:00
Nik Everett 9271c0302f CONSOLEify some aggs docs
Cleans up the example result in `children-aggregation` so that
it matches the example data.

Relates to #18160
2016-10-03 09:22:56 -04:00
Nik Everett 5cff2a046d Remove most of the need for `// NOTCONSOLE`
and be much more stingy about what we consider a console candidate.

* Add `// CONSOLE` to check-running
* Fix version in some snippets
* Mark groovy snippets as groovy
* Fix versions in plugins
* Fix language marker errors
* Fix language parsing in snippets

  This adds support for snippets who's language is written like
  `[source, txt]` and `["source","js",subs="attributes,callouts"]`.

  This also makes language required for snippets which is nice because
  then we can be sure we can grep for snippets in a particular language.
2016-09-06 10:32:54 -04:00
Clinton Gormley de208cf78c Fied bad asciidoc 2016-08-18 14:08:58 +02:00
Clinton Gormley 31e5e0b17f Document that pipeline aggs cannot be used for sorting
Closes #20037
2016-08-18 13:52:45 +02:00
Adrien Grand a0818d3b87 Split regular histograms from date histograms. #19551
Currently both aggregations really share the same implementation. This commit
splits the implementations so that regular histograms can support decimal
intervals/offsets and compute correct buckets for negative decimal values.

However the response API is still the same. So for intance both regular
histograms and date histograms will produce an
`org.elasticsearch.search.aggregations.bucket.histogram.Histogram`
aggregation.

The optimization to compute an identifier of the rounded value and the
rounded value itself has been removed since it was only used by regular
histograms, which now do the rounding themselves instead of relying on the
Rounding abstraction.

Closes #8082
Closes #4847
2016-08-03 08:39:48 +02:00
Adrien Grand dcc598c414 Make the heuristic to compute the default shard size less aggressive.
The current heuristic to compute a default shard size is pretty aggressive,
it returns `max(10, number_of_shards * size)` as a value for the shard size.
I think making it less aggressive has the benefit that it would reduce the
likelyness of running into OOME when there are many shards (yearly
aggregations with time-based indices can make numbers of shards in the
thousands) and make the use of breadth-first more likely/efficient.

This commit replaces the heuristic with `size * 1.5 + 10`, which is enough
to have good accuracy on zipfian distributions.
2016-07-29 09:59:29 +02:00
Jared McQueen d97b3fd817 [docs] missing a comma in the terms aggregation example 2016-07-27 12:59:38 -04:00
Leon Weidauer 1297a707da non-binary gender option in term aggr. example (#19188)
* non-binary gender option in term aggr. example

* replace gender with music genre for term aggregation docs
2016-07-01 14:59:03 +02:00
Jason Tedor 00356edd33 Clarify time units usage in docs
This commit clarifies the distinction between supported time units for
durations and supported time units for durations in the docs.

Relates #19159
2016-06-29 17:02:15 -04:00
Robert Muir 6fc1a22977 cutover some docs to painless 2016-06-27 09:55:16 -04:00
Jim Ferenczi fb2a48d0f0 Revert "Remove support for sorting terms aggregation by ascending count"
This is delayed after alpha4 since Kibana relies on it.
2016-06-17 17:14:01 +02:00
Jim Ferenczi 755721953b Remove support for sorting terms aggregation by ascending count
closes #17614
2016-06-17 15:06:49 +02:00
Glen Smith 5284c5094d grammar 2016-06-17 10:09:21 +02:00
Jim Ferenczi ad232aebbe Set collection mode to breadth_first in the terms aggregation when the cardinality of the field is unknown or smaller than the requested size.
closes #9825
2016-06-16 11:33:40 +02:00
Colin Goodheart-Smithe cfd3356ee3 Remove size 0 options in aggregations
This removes the ability to set `size: 0` in the `terms`, `significant_terms` and `geohash_grid` aggregations for the reasons described in https://github.com/elastic/elasticsearch/issues/18838

Closes #18838
2016-06-14 13:07:02 +01:00
Adrien Grand 638da06c1d Add back support for `ip` range aggregations. #17859
This commit adds support for range aggregations on `ip` fields. However it will
only work on 5.x indices.

Closes #17700
2016-05-13 17:22:01 +02:00
Robert Muir c5532d3df0 add a rest test for this that seems to work, fix the documentation. thanks @s1monw 2016-05-11 16:07:08 -04:00
Jim Ferenczi 052191f2a2 Add the ability to use the breadth_first mode with nested aggregations (such as `top_hits`) which require access to score information.
The score is recomputed lazily for each document belonging to a top bucket.
Relates to #9825
2016-05-04 15:35:45 +02:00
Sergii Golubev 434a563fe0 terms-aggregation.asciidoc tiny edit 2016-04-13 16:51:47 -06:00
Sergii Golubev 39b914bd77 histogram-aggregation.asciidoc: tiny edit (#17706) 2016-04-13 14:19:05 +02:00
Adrien Grand 1d0239c125 Add a warning about the impact of sorting terms aggregations on the accuracy of doc counts. 2016-04-07 16:57:44 +02:00
Adrien Grand b42f66c8ac Document 5.0 mapping changes. 2016-03-22 16:22:58 +01:00
Clinton Gormley 0ed0fea558 Updated link to Joda time zones 2016-03-14 12:24:58 +01:00
Christoph Büscher ff46303f15 Simplify mock scripts 2016-03-07 15:39:35 +01:00
Christoph Büscher 6b0f63e1a6 Adding `time_zone` parameter to daterange-aggregation docs 2016-03-07 15:38:24 +01:00
Colin Goodheart-Smithe e546db0753 [DOCS] fix to sampler agg documentation 2016-02-15 13:17:19 +00:00
Colin Goodheart-Smithe 5f489b99bf fixed docs link error 2016-02-15 12:12:16 +00:00
Colin Goodheart-Smithe 1f760bd1bd Merge branch 'master' into feature/aggs-refactoring 2016-02-10 12:16:26 +00:00
Dongjoon Hyun 21ea552070 Fix typos in docs. 2016-02-09 02:07:32 -08:00
Colin Goodheart-Smithe 3b35754f59 Merge branch 'master' into feature/aggs-refactoring
# Conflicts:
#	core/src/test/java/org/elasticsearch/percolator/PercolateDocumentParserTests.java
2016-01-26 13:17:53 +00:00
Clinton Gormley 7cde0d47bc Merge pull request #16215 from eemp/patch-1
Update filters-aggregation.asciidoc
2016-01-26 12:56:43 +01:00
Colin Goodheart-Smithe cd8320b171 Merge branch 'master' into feature/aggs-refactoring
# Conflicts:
#	core/src/main/java/org/elasticsearch/search/aggregations/bucket/filter/FilterAggregator.java
#	core/src/main/java/org/elasticsearch/search/aggregations/bucket/filters/FiltersAggregator.java
#	core/src/main/java/org/elasticsearch/search/SearchModule.java
2016-01-25 10:42:20 +00:00
Kevin Adams 768d171f77 Timezone: use forward slash
Using a backslash causes errors when querying elasticsearch, but changing the back slash to forward slash on the timezone fixes it.

Closes #16148
2016-01-22 14:26:49 +01:00
Colin Goodheart-Smithe 2c33f78192 Merge branch 'master' into feature/aggs-refactoring
# Conflicts:
#	core/src/main/java/org/elasticsearch/search/aggregations/bucket/children/ChildrenParser.java
#	core/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceParser.java
#	test/framework/src/main/java/org/elasticsearch/test/TestSearchContext.java
2016-01-06 09:35:53 +00:00
Eugene Pirogov d48af9a155 Fix indent in example
Previously it would look like if `warnings` key is nested under `errors`.
2016-01-05 14:41:09 +01:00
Colin Goodheart-Smithe 1aea0faa86 Aggregations Refactor: Refactor Sampler Aggregation 2015-12-21 09:35:46 +00:00
Clinton Gormley 3e7201ef63 Merge pull request #14096 from speedplane/patch-2
Fixed a typo ("when when")
2015-10-13 21:17:09 +02:00
Alex 4077a322c5 Docs: Fix typo - datehistogram
date_histogram in place of datehistogram

Closes #13886
2015-10-06 19:22:21 +02:00
Taehee Kim 45e0ccd274 Fix typo 2015-09-25 06:42:21 +09:00
Adrien Grand 86f1b07df0 Docs: Remove docs for the `filtered`, `and`, `or` and `(f)query` queries. 2015-09-11 11:00:54 +02:00
Clinton Gormley 8aba6ce93a Docs: Improved the date histogram docs for time_zone and offset 2015-09-07 19:54:00 +02:00
Sylvain Zimmer c2f774ac57 Warning in the docs for negative histogram values
As requested in https://github.com/elastic/elasticsearch/issues/8082#issuecomment-127962374
2015-08-07 13:10:03 +02:00
Clinton Gormley ac2b8951c6 Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:51 +02:00
Sylvain Zimmer 12a2db5417 Fix typo in docs 2015-07-31 19:11:04 -04:00
Ryan Ernst dba42a83e2 Docs: Update time_zone specification
closes #12317
2015-07-21 00:22:53 -07:00
Colin Goodheart-Smithe e366d0380d Aggregations: Adds other bucket to filters aggregation
The filters aggregation now has an option to add an 'other' bucket which will, when turned on, contain all documents which do not match any of the defined filters. There is also an option to change the name of the 'other' bucket from the default of '_other_'

Closes #11289
2015-07-01 10:44:04 +01:00
William Li 2be3fe31a4 Docs: Update filter-aggregation.asciidoc
Closes #11782
2015-07-01 10:17:45 +02:00
Christoph Büscher f5f73259e4 Docs: Update Joda URLs in documentation. 2015-06-26 10:23:02 +02:00
Clinton Gormley 37eae789a0 Merge pull request #11801 from golubev/patch-6
fix json syntax in filters-aggregation.asciidoc
2015-06-23 20:02:04 +02:00
caldwecr 1ac728d22b Docs: Update filter-aggregation.asciidoc
Replace the previous example which leveraged a range filter, which causes unnecessary confusion about when to use a range filter to create a single bucket or a range aggregation with exactly one member in ranges.

Closes #11704
2015-06-19 12:24:42 +02:00
Clinton Gormley 64ec18afa0 Merge pull request #11661 from pjcard/patch-1
Make explicit the requirement for intervals to be integers
Conflicts:
	docs/reference/search/aggregations/bucket/histogram-aggregation.asciidoc
2015-06-15 11:42:12 +02:00
Colin Goodheart-Smithe 35a58d874e Scripting: Unify script and template requests across codebase
This change unifies the way scripts and templates are specified for all instances in the codebase. It builds on the Script class added previously and adds request building and parsing support as well as the ability to transfer script objects between nodes. It also adds a Template class which aims to provide the same functionality for template APIs

Closes #11091
2015-05-29 16:52:04 +01:00
Adrien Grand 32e23b9100 Aggs: Make it possible to configure missing values.
Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new `missing` option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a `tag` field.

This works in a very similar way to the `missing` option on the `sort`
element.

One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the `missing` value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an `unmapped_type`
option in the future like we did for sorting.

Related to #5324
2015-05-15 16:26:58 +02:00
Adrien Grand a0af88e996 Query DSL: Remove filter parsers.
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
2015-05-07 20:14:34 +02:00
Pascal Borreli af6d890ad5 Docs: Fixed typos
Closes #10973
2015-05-05 10:38:05 +02:00
Zachary Tong 967e05ea76 [DOCS] Fix section levels for Sampler agg 2015-05-04 09:18:24 -04:00
Zachary Tong e3ae1df6f0 [DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00