Zachary Tong 9cc33f4e29 [Rollup] Select best jobs then execute msearch-per-job (elastic/x-pack-elasticsearch#4152)
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.

To solve this, we execute an msearch-per-job so that the results
stay isolated.  When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree".  This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.

Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.

Job "best'ness" is, briefly, the job(s) that have
 - The largest compatible date interval
 - Fewer and larger interval histograms
 - Fewer terms groups

Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.

Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working.  In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception.  To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because 
that is handled by the job validation and pruning.

Original commit: elastic/x-pack-elasticsearch@995be2a039
2018-03-27 10:33:59 -07:00
2017-03-19 16:37:21 -04:00
2018-01-31 23:18:58 -05:00
2015-10-30 11:16:29 -06:00
2018-01-04 16:42:12 -05:00
2018-01-04 16:42:12 -05:00
2018-04-20 14:16:58 -07:00

= Elasticsearch X-Pack

A set of Elastic's commercial plugins for Elasticsearch:

- License
- Security
- Watcher
- Monitoring
- Machine Learning
- Graph

= Setup

You must checkout `x-pack-elasticsearch` and `elasticsearch` with a specific directory structure. The
`elasticsearch` checkout will be used when building `x-pack-elasticsearch`. The structure is:

- /path/to/elastic/elasticsearch
- /path/to/elastic/elasticsearch-extra/x-pack-elasticsearch

== Vault Secret

The build requires a Vault Secret ID. You can use a GitHub token by following these steps:

1. Go to https://github.com/settings/tokens
2. Click *Generate new token*
3. Set permissions to `read:org`
4. Copy the token into `~/.elastic/github.token`
5. Set the token's file permissions to `600`

```
$ mkdir ~/.elastic
$ vi ~/.elastic/github.token
# Add your_token exactly as it is into the file and save it
$ chmod 600 ~/.elastic/github.token
```

If you do not create the token, then you will see something along the lines of this as the failure when trying to build X-Pack:

```
* What went wrong:
Missing ~/.elastic/github.token file or VAULT_SECRET_ID environment variable, needed to authenticate with vault for secrets
```

=== Offline Mode

When running the build in offline mode (`--offline`), it will not required to have the vault secret setup.

== Native Code

**This is mandatory as tests depend on it**

Machine Learning requires platform specific binaries, built from https://github.com/elastic/ml-cpp via CI servers.

= Build

- Run unit tests:
+
[source, txt]
-----
gradle clean test
-----

- Run all tests:
+
[source, txt]
-----
gradle clean check
-----

- Run integration tests:
+
[source, txt]
-----
gradle clean integTest
-----

- Package X-Pack (without running tests)
+
[source, txt]
-----
gradle clean assemble
-----

- Install X-Pack (without running tests)
+
[source, txt]
-----
gradle clean install
-----

= Building documentation

The source files in this repository can be included in either the X-Pack
Reference or the Elasticsearch Reference.

NOTE: In 5.4 and later, the Elasticsearch Reference includes X-Pack-specific
content that is pulled from this repo.

To build the Elasticsearch Reference on your local machine, use the `docbldes` 
or `docbldesx` build commands defined in
https://github.com/elastic/docs/blob/master/doc_build_aliases.sh

== Adding Images

When you include an image in the documentation, specify the path relative to the
location of the asciidoc file. By convention, we put images in an `images`
subdirectory.

For example to insert `watcher-ui-edit-watch.png` in `watcher/limitations.asciidoc`:

. Add an `images` subdirectory to the watcher directory if it doesn't already exist.
. In `limitations.asciidoc` specify:
+
[source, txt]
-----
 image::images/watcher-ui-edit-watch.png["Editing a watch"]
-----

Please note that image names and anchor IDs must be unique within the book, so
do not use generic identifiers.
Description
🔎 Open source distributed and RESTful search engine.
Readme 546 MiB
Languages
Java 99.5%
Groovy 0.4%