OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-03-24 17:09:48 +00:00

History

Dimitris Athanasiou 9d9572e2b2 Reintroduce chunking to improve data extractor performance (elastic/elasticsearch#849 )

* Reintroduce chunking to improve data extractor performance

Performing a sorted search/scroll over a period of time that matches
a lot of documents is very expensive because for each page all
documents are traversed.

The solution is to chunk the search time and perform separate
search/scrolls for each chunk.

This commit is introducing a new `chung` config in `datafeed_config`
whose mode can be set to either of AUTO, OFF, MANUAL, with the latter
allowing to specify an explicit chunk size.

When set to AUTO, a heuristic is used in order to determine the chunk
size. The heuristic is based on estimating the time interval within
which we expect `scroll_size` documents and then taking the 10x multiple
of that. Based on benchmarking, this method gives a dramatic performance
increase. For example, for the citizens dataset it improved the ingest
rate from 0.33M docs / minute to 13.6M docs / minute. Farequote is now
done in ~1 second.

Finally, note that when `chunk` is not specified, it defaults to AUTO
when aggregations are not set and to OFF otherwise. This is because
the chunk size heuristic does not lend itself great for aggregations
where one needs to chunk based on the cardinality of buckets rather
than simply time.

Relates to elastic/elasticsearch#734

Original commit: elastic/x-pack-elasticsearch@a738e86d21

2017-02-03 15:50:01 +00:00

licenses

Change the way 3rd party licenses are distributed for the C++ components (elastic/elasticsearch#366 )

2016-12-01 15:16:57 +00:00

src

Reintroduce chunking to improve data extractor performance (elastic/elasticsearch#849 )

2017-02-03 15:50:01 +00:00

.gitignore

Moves Java code to fit x-pack style structure

2016-11-18 16:35:00 +00:00

build.gradle

Move the named pipe no bootstrap test to a separate qa module (elastic/elasticsearch#769 )

2017-01-23 12:08:35 +00:00

gradle.properties

Centralises where the version is defined

2016-12-02 15:17:49 +00:00

README.asciidoc

Rename prelert to ml (elastic/elasticsearch#681 )

2017-01-10 13:40:16 +00:00

README.asciidoc

= Elasticsearch Ml Plugin

Behavioral Analytics for Elasticsearch