Commit Graph

336 Commits

Author SHA1 Message Date
Adrien Grand e5be85d586 Aggs: Change the default `min_doc_count` to 0 on histograms.
The assumption is that gaps in histogram are generally undesirable, for instance
if you want to build a visualization from it. Additionally, we are building new
aggregations that require that there are no gaps to work correctly (eg.
derivatives).
2015-04-30 15:48:23 +02:00
Colin Goodheart-Smithe 969f53e399 fix typo in Min bucket aggregation docs 2015-04-30 14:41:01 +01:00
Colin Goodheart-Smithe d16bf992a9 Aggregations: min_bucket aggregation
An aggregation to calculate the minimum value in a set of buckets.

Closes #9999
2015-04-30 13:34:21 +01:00
Zachary Tong 351a4d3315 [DOCS] Fix movavg images and naming 2015-04-29 13:33:54 -04:00
Colin Goodheart-Smithe 57a8885964 Merge branch 'master' into feature/aggs_2_0
# Conflicts:
#	src/main/java/org/elasticsearch/index/query/CommonTermsQueryBuilder.java
#	src/main/java/org/elasticsearch/search/aggregations/AggregationModule.java
#	src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java
#	src/main/java/org/elasticsearch/search/aggregations/AggregatorParsers.java
#	src/main/java/org/elasticsearch/search/aggregations/InternalMultiBucketAggregation.java
#	src/main/java/org/elasticsearch/search/aggregations/bucket/nested/NestedAggregator.java
#	src/main/java/org/elasticsearch/search/aggregations/metrics/InternalNumericMetricsAggregation.java
#	src/test/java/org/elasticsearch/search/aggregations/bucket/nested/NestedAggregatorTest.java
2015-04-29 15:49:41 +01:00
Antonio Bonuccelli ab83eb036b Docs: adding missing single quote on PUT index request
Closes #10876
2015-04-29 14:45:25 +02:00
Zachary Tong bf9739d0f0 [DOCS] review comment fixes 2015-04-27 14:40:04 -04:00
Clinton Gormley 37ed61807f Docs: Updated the experimental annotations in the docs as follows:
* Removed the docs for `index.compound_format` and `index.compound_on_flush` - these are expert settings which should probably be removed (see https://github.com/elastic/elasticsearch/issues/10778)
* Removed the docs for `index.index_concurrency` - another expert setting
* Labelled the segments verbose output as experimental
* Marked the `compression`, `precision_threshold` and `rehash` options as experimental in the cardinality and percentile aggs
* Improved the experimental text on `significant_terms`, `execution_hint` in the terms agg, and `terminate_after` param on count and search
* Removed the experimental flag on the `geobounds` agg
* Marked the settings in the `merge` and `store` modules as experimental, rather than the modules themselves

Closes #10782
2015-04-26 18:49:15 +02:00
Clinton Gormley f1a0e2216a Docs: Mentioned script_id and script_file parameters across all aggs
Closes #10760
2015-04-26 17:30:38 +02:00
Clinton Gormley 7de8b7008e Docs: Tidied docs for field-stats 2015-04-26 15:52:02 +02:00
Mehdi Mollaverdi dce920b75f Docs: The name of scroll ID attribute in the response is "_scroll_id" rather than "scroll_id"
Closes #10691
2015-04-25 19:32:32 +02:00
Mal Curtis 9eabcd7c0f Docs: Fix missing comma in context suggester docs
Closes #10623
2015-04-23 14:04:46 +02:00
Martijn van Groningen dbeb4aaacf docs: make sure that the options are rendered correctly 2015-04-23 10:50:01 +02:00
Martijn van Groningen 6a2f9c2682 docs: fixed title out of sequence 2015-04-23 09:57:31 +02:00
Martijn van Groningen 5705537ecf Added field stats api
The field stats api returns field level statistics such as lowest, highest values and number of documents that have at least one value for a field.

An api like this can be useful to explore a data set you don't know much about. For example you can figure at with the lowest and highest response times are, so that you can create a histogram or range aggregation with sane settings.

This api doesn't run a search to figure this statistics out, but rather use the Lucene index look these statics up (using Terms class in Lucene). So finding out these stats for fields is cheap and quick.

The min/max values are based on the type of the field. So for a numeric field min/max are numbers and date field the min/max date and other fields the min/max are term based.

Closes #10523
2015-04-23 08:52:34 +02:00
Zachary Tong e08e45cee8 [DOCS] Add link to movavg page 2015-04-22 18:59:39 -04:00
Zachary Tong a03cefcece [DOCS] Add documentation for moving average 2015-04-22 18:59:39 -04:00
Clinton Gormley a60571c597 Docs: Removed some unused callout from the scroll docs 2015-04-22 12:49:06 +02:00
Jun Ohtani 0955c127c0 Rest: Add json in request body to scroll, clear scroll, and analyze API
Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866
2015-04-22 17:53:20 +09:00
Colin Goodheart-Smithe bd28c9c44e Documentation for the max_bucket reducer 2015-04-21 15:06:20 +01:00
Colin Goodheart-Smithe be647a89d3 Documentation for the derivative reducer 2015-04-21 15:06:20 +01:00
Colin Goodheart-Smithe 0f4b7f3b5c Added section for reducer aggregations in the main aggregation docs page 2015-04-21 15:06:19 +01:00
markharwood 63db34f649 New feature - Sampler aggregation used to limit any nested aggregations' processing to a sample of the top-scoring documents.
Optionally, a “diversify” setting can limit the number of collected matches that share a common value such as an "author".

Closes #8108
2015-04-21 10:22:05 +01:00
Adrien Grand f4d5914511 Docs: Warn about the fact that min_doc_count=0 might return terms that only belong to different types. 2015-04-21 00:57:57 +02:00
Honza Král e929c1560d [DOCS] Be explicit about scan doing no scoring 2015-04-20 18:05:45 +02:00
Alex Ksikes c347dfe91c Validate API: support for verbose explanation of succesfully validated queries
This commit adds a `rewrite` parameter to the validate API in order to shown
how the given query is re-written into primitive queries. For example, an MLT
query is re-written into a disjunction of the selected terms. Other use cases
include `fuzzy`, `common_terms`, or `match` query especially with a
`cutoff_frequency` parameter. Note that the explanation is only given for a
single randomly chosen shard only, so the output may vary from one shard to
another.

Relates #1412
Closes #10147
2015-04-13 19:17:58 +02:00
Clinton Gormley abc7de96ae Docs: Updated version annotations in master 2015-04-09 14:50:11 +02:00
Adrien Grand aecd9ac515 Aggregations: Speed up include/exclude in terms aggregations with regexps.
Today we check every regular expression eagerly against every possible term.
This can be very slow if you have lots of unique terms, and even the bottleneck
if your query is selective.

This commit switches to Lucene regular expressions instead of Java (not exactly
the same syntax yet most existing regular expressions should keep working) and
uses the same logic as RegExpQuery to intersect the regular expression with the
terms dictionary. I wrote a quick benchmark (in the PR) to make sure it made
things faster and the same request that took 750ms on master now takes 74ms with
this change.

Close #7526
2015-04-09 12:12:56 +02:00
marko asplund 5585175173 Docs: fix typos in example JSON data
Closes #10479
2015-04-08 13:40:35 +02:00
Adrien Grand a608db122d Search: Remove the `count` search type.
This commit brings the benefits of the `count` search type to search requests
that have a `size` of 0:
 - a single round-trip to shards (no fetch phase)
 - ability to use the query cache

Since `count` now provides no benefits over `query_then_fetch`, it has been
deprecated.

Close #7630
2015-03-31 11:31:49 +02:00
olivier bourgain 00a9db73ae [DOCS] Fix multi percolate response sample in percolate.asciidoc 2015-03-30 11:32:41 +02:00
javanna d9d1e6a67a Scripting: add support for fine-grained settings
Allow to on/off scripting based on their source (where they get loaded from), the  operation that executes them and their language.

The settings cover the following combinations:

- mode: on, off, sandbox
- source: indexed, dynamic, file
- engine: groovy, expressions, mustache, etc
- operation: update, search, aggs, mapping

The following settings are supported for every engine:

script.engine.groovy.indexed.update:    sandbox/on/off
script.engine.groovy.indexed.search:    sandbox/on/off
script.engine.groovy.indexed.aggs:      sandbox/on/off
script.engine.groovy.indexed.mapping:   sandbox/on/off
script.engine.groovy.dynamic.update:    sandbox/on/off
script.engine.groovy.dynamic.search:    sandbox/on/off
script.engine.groovy.dynamic.aggs:      sandbox/on/off
script.engine.groovy.dynamic.mapping:   sandbox/on/off
script.engine.groovy.file.update:       sandbox/on/off
script.engine.groovy.file.search:       sandbox/on/off
script.engine.groovy.file.aggs:         sandbox/on/off
script.engine.groovy.file.mapping:      sandbox/on/off

For ease of use, the following more generic settings are supported too:

script.indexed: sandbox/on/off
script.dynamic: sandbox/on/off
script.file:    sandbox/on/off

script.update:  sandbox/on/off
script.search:  sandbox/on/off
script.aggs:    sandbox/on/off
script.mapping: sandbox/on/off

These will be used to calculate the more specific settings, using the stricter setting of each combination. Operation based settings have precedence over conflicting source based ones.

Note that the `mustache` engine is affected by generic settings applied to any language, while native scripts aren't as they are static by definition.

Also, the previous `script.disable_dynamic` setting can now be deprecated.

Closes #6418
Closes #10116
Closes #10274
2015-03-26 19:56:55 +01:00
Boaz Leskes 4970e3e225 Revert "Rest: Add json in request body to scroll, clear scroll, and analyze API"
This reverts commit 16083d454c.
2015-03-23 12:57:19 +01:00
Jun Ohtani 16083d454c Rest: Add json in request body to scroll, clear scroll, and analyze API
Add json support to scroll, clear scroll, and analyze

Closes #5866
2015-03-23 15:35:38 +09:00
Simon Willnauer 7257345db9 Revert Benchmark API
The benchmark api is being worked on feature/bench branch and will be merged from there when ready.
2015-03-21 10:36:04 +01:00
Asimov4 649e3aa4c5 [DOCS] Fix typos in percolate.asciidoc 2015-03-21 10:23:15 +01:00
Martijn van Groningen 4393939f5e inner_hits: Nested parent field should be resolved based on the parent inner hit definition, instead of the nested parent field in the mapping.
The behaviour is better in the case someone has multiple levels of nested object fields defined in the mapping and like to define a single inner_hits definition that is two or more levels deep.

If someone wants inner hits on a nested field that is 2 levels deep the following would need to be defined:

```
{
  ...
  "inner_hits" : {
     "path" : {
        "level1" : {
            "inner_hits" : {
               "path" : {
                  "level2" : {
                     "query" : { .... }
                  }
               }
            }
        }
     }
  }
}
```

With this change the above can be defined as:

```
{
  ...
  "inner_hits" : {
     "path" : {
        "level1.level2" : {
            "query" : { .... }
        }
     }
  }
}
```

Closes #9251
2015-03-16 16:31:03 -07:00
Lee Hinman 6aec68cd29 Revert "[QUERY] Remove lowercase_expanded_terms and locale options"
This reverts commit d1f7bd97cb.

Ryan pointed out that this needs to work with the multi term query, so
additional analysis and tests should be added.
2015-03-13 13:51:44 -06:00
Lee Hinman d1f7bd97cb [QUERY] Remove lowercase_expanded_terms and locale options
The analysis chain should be used instead of relying on this, as it is
confusing when dealing with different per-field analysers.

The `locale` option was only used for `lowercase_expanded_terms`, which,
once removed, is no longer needed, so it was removed as well.

Fixes #9978
Relates to #9973
2015-03-13 13:17:27 -06:00
olivier bourgain bcb4decca9 [DOCS] add missing comma in percentile_rank aggregation example 2015-03-10 08:21:06 -07:00
olivier bourgain fb7cd2ea9a [DOCS] Adjusted geo_distance aggregation example
unit is not returned in the response, but we have key and an implicit from starting at 0 for the first bucket
2015-03-10 08:20:20 -07:00
olivier bourgain eaeddc6bd4 [DOCS] missing curly brace in ip_range aggregation example 2015-03-10 08:19:57 -07:00
Britta Weber 580728dfd6 significant terms: add scriptable significance heuristic
This commit adds scripting capability to significant_terms.
Custom heuristics can be implemented with a script that provides
parameters subset_freq, superset_freq,subset_size, superset_size.

closes #7850
2015-03-06 17:06:04 +01:00
Clinton Gormley c223ed0db4 Update search-type.asciidoc
Changed search_type docs to reflect that the `(dfs_)query_and_fetch` modes are an internal optimization and should not be specified explicitly by the user.

Relates to #9606
2015-03-02 10:55:22 +01:00
Geoff Bourne 0e09c02c56 Spelling out the sort order options
Closes #9768
2015-03-01 21:05:52 +01:00
Clinton Gormley e194fb3a07 Docs: Default distance unit in geo distance agg is metres, not km
Closes #9812
2015-02-28 01:45:29 +01:00
Colin Goodheart-Smithe 2520dc78ec [DOCS] added a note for the default shard_size value 2015-02-25 11:00:55 +00:00
markharwood 29b1902cfb New aggregations feature - “PercentageScore” heuristic for significant_terms aggregation provides simple “per-capita” type measures.
Closes #9720
2015-02-20 13:22:08 +00:00
Christoph Büscher 30fd70f07b Aggregations: Simplify time zone option in `date_histogram`
Removed the existing `pre_zone` and `post_zone` option in `date_histogram` in favor of
the simpler `time_zone` option. Previously, specifying different values for these could
lead to confusing scenarios where ES would return bucket keys that are not UTC.
Now `time_zone` is the only option setting, the calculation of date buckets to take place in the
preferred time zone, but after rounding converting the bucket key values back to UTC.

Closes #9062
Closes #9637
2015-02-16 16:54:06 +01:00
Clinton Gormley 6fadeeca56 Updated doc annotations for 1.4.3 2015-02-11 17:54:53 +01:00