Commit Graph

3 Commits

Author SHA1 Message Date
James Rodewig f04573f8e8
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459) 2019-09-06 16:09:09 -04:00
James Rodewig 1f36c4e50c
[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159) (#46332) 2019-09-05 10:11:25 -04:00
Zachary Tong ea1794832f Add RareTerms aggregation (#35718)
This adds a `rare_terms` aggregation.  It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.

This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).

This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again.  If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter.  If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".

The map keys are the "rare" terms after collection is done.
2019-07-01 10:30:02 -04:00