Commit Graph

4471 Commits

Author SHA1 Message Date
Colin Goodheart-Smithe 0edb096eb4 Adds a new auto-interval date histogram (#28993)
* Adds a new auto-interval date histogram

This change adds a new type of histogram aggregation called `auto_date_histogram` where you can specify the target number of buckets you require and it will find an appropriate interval for the returned buckets. The aggregation works by first collecting documents in buckets at second interval, when it has created more than the target number of buckets it merges these buckets into minute interval bucket and continues collecting until it reaches the target number of buckets again. It will keep merging buckets when it exceeds the target until either collection is finished or the highest interval (currently years) is reached. A similar process happens at reduce time.

This aggregation intentionally does not support min_doc_count, offest and extended_bounds to keep the already complex logic from becoming more complex. The aggregation accepts sub-aggregations but will always operate in `breadth_first` mode deferring the computation of sub-aggregations until the final buckets from the shard are known. min_doc_count is effectively hard-coded to zero meaning that we will insert empty buckets where necessary.

Closes #9572

* Adds documentation

* Added sub aggregator test

* Fixes failing docs test

* Brings branch up to date with master changes

* trying to get tests to pass again

* Fixes multiBucketConsumer accounting

* Collects more buckets than needed on shards

This gives us more options at reduce time in terms of how we do the
final merge of the buckeets to produce the final result

* Revert "Collects more buckets than needed on shards"

This reverts commit 993c782d117892af9a3c86a51921cdee630a3ac5.

* Adds ability to merge within a rounding

* Fixes nonn-timezone doc test failure

* Fix time zone tests

* iterates on tests

* Adds test case and documentation changes

Added some notes in the documentation about the intervals that can bbe
returned.

Also added a test case that utilises the merging of conseecutive buckets

* Fixes performance bug

The bug meant that getAppropriate rounding look a huge amount of time
if the range of the data was large but also sparsely populated. In
these situations the rounding would be very low so iterating through
the rounding values from the min key to the max keey look a long time
(~120 seconds in one test).

The solution is to add a rough estimate first which chooses the
rounding based just on the long values of the min and max keeys alone
but selects the rounding one lower than the one it thinks is
appropriate so the accurate method can choose the final rounding taking
into account the fact that intervals are not always fixed length.

Thee commit also adds more tests

* Changes to only do complex reduction on final reduce

* merge latest with master

* correct tests and add a new test case for 10k buckets

* refactor to perform bucket number check in innerBuild

* correctly derive bucket setting, update tests to increase bucket threshold

* fix checkstyle

* address code review comments

* add documentation for default buckets

* fix typo
2018-07-13 13:08:35 -04:00
Mayya Sharipova 80492cacfc
Add second level of field collapsing (#31808)
* Put second level collapse under inner_hits

Closes #24855
2018-07-13 11:40:03 -04:00
Clinton Gormley bc1284eb28 Docs: Restyled cloud link in getting started 2018-07-13 15:48:14 +02:00
Clinton Gormley 9a928756e9 Docs: Change formatting of Cloud options 2018-07-13 15:40:38 +02:00
Alan Woodward a01e26a39b
Correct spelling of AnalysisPlugin#requriesAnalysisSettings (#32025)
Because this is a static method on a public API, and one that we encourage
plugin authors to use, the method with the typo is deprecated in 6.x
rather than just renamed.
2018-07-13 13:13:21 +01:00
Daniel Mitterdorfer f174f72fee
Circuit-break based on real memory usage
With this commit we introduce a new circuit-breaking strategy to the parent
circuit breaker. Contrary to the current implementation which only accounts for
memory reserved via child circuit breakers, the new strategy measures real heap
memory usage at the time of reservation. This allows us to be much more
aggressive with the circuit breaker limit so we bump it to 95% by default. The
new strategy is turned on by default and can be controlled  with the new cluster
setting `indices.breaker.total.userealmemory`.

Note that we turn it off for all integration tests with an internal test cluster
because it leads to spurious test failures which are of no value (we cannot
fully control heap memory usage in tests). All REST tests, however, will make
use of the real memory circuit breaker.

Relates #31767
2018-07-13 10:08:28 +02:00
Jimi Ford e955ffc38d Docs: fix typo in datehistogram (#31972) 2018-07-11 15:04:57 -04:00
Clinton Gormley aedbfc63cd Docs: Added note about cloud service to installation and getting started 2018-07-11 20:17:18 +02:00
Lisa Cawley efcfd0d827
[DOCS] Removes alternative docker pull example (#31934) 2018-07-11 09:08:32 -07:00
Sohaib Iftikhar 88c270d844 Added lenient flag for synonym token filter (#31484)
* Added lenient flag for synonym-tokenfilter.

Relates to #30968

* added docs for synonym-graph-tokenfilter

-- Also made lenient final
-- changed from !lenient to lenient == false

* Changes after review (1)

-- Renamed to ElasticsearchSynonymParser
-- Added explanation for ElasticsearchSynonymParser::add method
-- Changed ElasticsearchSynonymParser::logger instance to static

* Added lenient option for WordnetSynonymParser

-- also added more documentation

* Added additional documentation

* Improved documentation
2018-07-10 17:11:50 -04:00
Jim Ferenczi 584fa261cc
Remove the ability to index or query context suggestions without context (#31007)
This is a follow up of #30712 that removes the ability to index or query
and context enabled completion field without context.

Relates #30712
2018-07-09 16:01:01 +02:00
Armin Braun e46ed73379
Ingest: Add ignore_missing option to RemoveProc (#31693)
Added `ignore_missing` setting to the RemoveProcessor to fix #23086
2018-07-09 10:24:34 +02:00
Russ Cam 0dac73c4fb Remove link to oss-MSI (#31844)
This commit removes the link to an oss-MSI; there is only one version of the MSI, which includes X-Pack.

(cherry picked from commit d2e5db8a806ec8a25162f79db5209aceed4f30f7)
2018-07-09 11:31:38 +10:00
Costin Leau 9ffb26ab02
SQL: Remove restriction for single column grouping (#31818)
For historical reasons SQL restricts GROUP BY to only one field.
This commit removes the restriction and improves the test suite with
multi group by tests.

Close #31793
2018-07-06 20:55:27 +03:00
Piotr Prądzyński 99030e7af5 Docs: Inconsistency between description and example (#31858) 2018-07-06 12:44:20 -04:00
Christoph Büscher 3c11c7c261
[Docs] Add clarification to analysis example (#31826)
There have been at least two PRs trying to fix the spelling of "lazi" because it
isn't very clear from the example that the english analyzer will stem each token
in the example. This adds a short description of the analysis process to make
this clearer.

Relates to #31797
2018-07-06 14:36:58 +02:00
Christoph Büscher 450a450b2c
[Docs] Clarify accepted sort case (#31605)
Rescore only works with an explicite "sort" element if it is on descending
"_score". Even using "order" : "asc" will throw an error.
2018-07-06 10:11:36 +02:00
Nik Everett c0b2ef55b8 Docs: Explain _bulk?refresh shard targeting
Only the shards that receive the bulk request will be affected by
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
documents in it that happen to be routed to different shards in an index
with five shards. The request will only wait for those three shards to
refresh. The other two shards of that make up the index do not
participate in the `_bulk` request at all.

Relates to #31819
2018-07-05 16:24:03 -04:00
Sohaib Iftikhar 40b822c878 Scripting: Remove support for deprecated StoredScript contexts (#31394)
Removes support for storing scripts without the usual json around the
script. So You can no longer do:
```
POST _scripts/<templatename>
{
    "query": {
        "match": {
            "title": "{{query_string}}"
        }
    }
}
```

and must instead do:
```
POST _scripts/<templatename>
{
    "script": {
        "lang": "mustache",
        "source": {
            "query": {
                "match": {
                    "title": "{{query_string}}"
                }
            }
        }
    }
}
```

This improves error reporting when you attempt to store a script but don't
quite get the syntax right. Before, there was a good chance that we'd
think of it as a "raw" template and just store it. Now we won't do that.
Nice.
2018-07-05 09:30:08 -04:00
Christoph Büscher 5f87a84bef
[Docs] Correct default window_size (#31582) 2018-07-04 14:07:20 +02:00
Lisa Cawley ac7fadd336
[DOCS] Starting Elasticsearch (#31701) 2018-07-03 13:40:37 -07:00
Jake Landis c0056cddd8
ingest: Introduction of a bytes processor (#31733)
ingest: Introduction of a bytes processor

This processor allows for human readable byte values (e.g. 1kb) to be converted to value in bytes (e.g. 1024). Internally this processor re-uses "ByteSizeValue.parseBytesSizeValue" which supports conversions up to Long.MAX_VALUE and the following units: "b", "kb", "mb", "gb", "tb", pb".

This change also introduces a generic return type for the AbstractStringProcessor to allow for code reuse while supporting a String -> T conversion. (String -> Long in this case).
2018-07-03 10:40:56 -05:00
Costin Leau 093ea037b4 [DOCS] Typos 2018-07-03 17:19:48 +03:00
Costin Leau de9e56aa01
DOC: Add examples to the SQL docs (#31633)
Significantly improve the example snippets in the documentation.
The examples are part of the test suite and checked nightly.
To help readability, the existing dataset was extended (test_emp renamed
to emp plus library).
Improve output of JDBC tests to be consistent with the CLI
Add lenient flag to JDBC asserts to allow type widening (a long is
equivalent to a integer as long as the value is the same).
2018-07-03 16:56:31 +03:00
Daniel Mitterdorfer 3d53daeb2f
Account for XContent overhead in in-flight breaker
So far the in-flight request circuit breaker has only accounted for the
on-the-wire representation of a request. However, we convert the raw
request into XContent internally which increases the overhead.
Therefore, we increase the value of the corresponding setting
`network.breaker.inflight_requests.overhead` from one to two. While this
value is still rather conservative (we assume that the representation as
structured objects has no overhead compared to the byte[]), it is closer
to reality than the current value.

Relates #31613
2018-07-03 09:17:16 +02:00
Peter Evers ea15284230 Docs: Match the examples in the description (#31710)
Prose drifted from snippet.
2018-07-02 14:12:49 -04:00
Sohaib Iftikhar c55d11f8b5 rest-high-level: added get cluster settings (#31706)
Relates to #27205
2018-07-02 13:25:17 -04:00
Albert Zaharovits 85ec497056
[DOCS] Secure settings specified per node (#31621)
Make it clear that secure settings have to be set
on each cluster node.
2018-07-01 11:11:47 +03:00
Fredrik Meyer ffc8b82ea3 [Docs] Use capital letters in section headings (#31678)
Section headings should start with capital letters.
2018-06-29 11:58:39 +02:00
Lisa Cawley 5925611e9e
[DOCS] Fix licensing API details (#31667) 2018-06-28 15:38:41 -07:00
Lisa Cawley 101d675f90
[DOCS] Replace CONFIG_DIR with ES_PATH_CONF (#31635) 2018-06-28 08:27:04 -07:00
Nik Everett 0522c6644d Docs: Remove duplicate test setup
The range docs had an introductory section that described how to set up
and index *and* a test setup section in `docs/build.gradle` that
duplicated that section. This is bad because these section can (and do)
drift from one another. This change removes the setup in build.gradle
and marks the introductor snippet with `// TESTSETUP` so it is used on
all the snippets.
2018-06-28 10:59:35 -04:00
Peter Evers 050fbc8f3d Docs: Fix description of percentile ranks example example (#31652) 2018-06-28 09:29:56 -04:00
DeDe Morton 50e60a510d
Update reindex.asciidoc (#31626) 2018-06-27 12:46:29 -07:00
Piotr Prądzyński 4fc833b1de Unify headers for full text queries
Relates #31599
2018-06-27 10:11:14 +02:00
Piotr Prądzyński f6c64a048d Remove redundant 'minimum_should_match'
Relates #31600
2018-06-27 10:11:07 +02:00
Armin Braun 13e1cf6191
ingest: Add ignore_missing property to foreach filter (#22147) (#31578) 2018-06-26 20:04:41 +02:00
Julie Tibshirani 26a927a120
Fix a formatting issue in the docvalue_fields documentation. (#31563) 2018-06-26 10:15:56 -07:00
Sue Gallagher 357a07e7a2
[DOCS] Fix heading format errors (#31483)
* [DOCS] Fix heading format errors. Closes #31327

* [DOCS] Fix heading format errors. Closes #31327
2018-06-25 17:25:32 -07:00
Jonathan Little 8e4768890a Migrate scripted metric aggregation scripts to ScriptContext design (#30111)
* Migrate scripted metric aggregation scripts to ScriptContext design #29328

* Rename new script context container class and add clarifying comments to remaining references to params._agg(s)

* Misc cleanup: make mock metric agg script inner classes static

* Move _score to an accessor rather than an arg for scripted metric agg scripts

This causes the score to be evaluated only when it's used.

* Documentation changes for params._agg -> agg

* Migration doc addition for scripted metric aggs _agg object change

* Rename "agg" Scripted Metric Aggregation script context variable to "state"

* Rename a private base class from ...Agg to ...State that I missed in my last commit

* Clean up imports after merge
2018-06-25 12:01:33 +01:00
Lisa Cawley 638b9fd88c
[DOCS] Move sql to docs (#31474) 2018-06-22 15:40:25 -07:00
Lisa Cawley eb81a305ae
[DOCS] Move monitoring to docs folder (#31477) 2018-06-22 15:39:34 -07:00
Lisa Cawley 68ec958873
[DOCS] Move migration APIs to docs (#31473) 2018-06-21 08:19:23 -07:00
Lisa Cawley f012de0f00
[DOCS] Move licensing APIs to docs (#31445) 2018-06-20 08:17:11 -07:00
Jonathan Pool 297e99c4c2 [Docs] Extend Homebrew installation instructions (#28902)
Adding a note about proceeding after a successful homebrew installation.
2018-06-20 14:20:51 +02:00
Peter Dyson e7a7b9689d [Docs] Mention ip_range datatypes on ip type page (#31416)
A link to the ip_range datatype page provides a way for newer users to know
it exists if they land directly on the ip datatype page first via a search.
2018-06-20 13:04:03 +02:00
Alan Woodward 5683bc60a6
Multiplexing token filter (#31208)
The `multiplexer` filter emits multiple tokens at the same position, each 
version of the token haivng been passed through a different filter chain.
Identical tokens at the same position are removed.

This allows users to, for example, index lowercase and original-case tokens,
or stemmed and unstemmed versions, in the same field, so that they can search
for a stemmed term within x positions of an unstemmed term.
2018-06-20 10:16:26 +01:00
Sue Gallagher b44e1c1978
[DOCS] Removed and params from MLT. Closes #28128 (#31370) 2018-06-19 13:48:13 -07:00
Lisa Cawley 8fd1f5fbed
[DOCS] Moves the info API to docs (#31121) 2018-06-19 10:33:57 -07:00
Nik Everett 5236d0291e
Docs: Advice for reindexing many indices (#31279)
Folks tend to want to be able to make a single `_reindex` call to
migrate many indices. You *can* do that and we even have an example of
how to do that in the docs but it isn't always a good idea. This change
adds some advice to the docs: generally you want to make one reindex
call per index.

Closes #22920
2018-06-19 11:15:50 -04:00