OpenSearch

Commit Graph

Author	SHA1	Message	Date
Britta Weber	a3cefd919e	significant terms: add google normalized distance, add chi square closes #6858	2014-08-04 08:15:26 +02:00
Britta Weber	74927adced	significant terms: infrastructure for changing easily the significance heuristic This commit adds the infrastructure to allow pluging in different measures for computing the significance of a term. Significance measures can be provided externally by overriding - SignificanceHeuristic - SignificanceHeuristicBuilder - SignificanceHeuristicParser closes #6561	2014-07-14 11:00:50 +02:00
Simon Willnauer	9d5507047f	Update Documentation Feature Flags [1.2.0]	2014-05-22 15:06:42 +02:00
Britta Weber	08e57890f8	use shard_min_doc_count also in TermsAggregation This was discussed in issue #6041 and #5998 . closes #6143	2014-05-14 14:10:04 +02:00
markharwood	1e560b0d92	Significant_terms agg: added option for a background_filter to define background context for analysis of term frequencies Closes #5944	2014-05-13 09:10:30 +01:00
Clinton Gormley	5b93255ec8	[DOCS] Added "Aggregation" to all aggs titles	2014-05-13 01:35:58 +02:00
Britta Weber	7944369fd1	Add `shard_min_doc_count` parameter for significant terms similar to `shard_size` Significant terms internally maintain a priority queue per shard with a size potentially lower than the number of terms. This queue uses the score as criterion to determine if a bucket is kept or not. If many terms with low subsetDF score very high but the `min_doc_count` is set high, this might result in no terms being returned because the pq is filled with low frequent terms which are all sorted out in the end. This can be avoided by increasing the `shard_size` parameter to a higher value. However, it is not immediately clear to which value this parameter must be set because we can not know how many terms with low frequency are scored higher that the high frequent terms that we are actually interested in. On the other hand, if there is no routing of docs to shards involved, we can maybe assume that the documents of classes and also the terms therein are distributed evenly across shards. In that case it might be easier to not add documents to the pq that have subsetDF <= `shard_min_doc_count` which can be set to something like `min_doc_count`/number of shards because we would assume that even when summing up the subsetDF across shards `min_doc_count` will not be reached. closes #5998 closes #6041	2014-05-07 18:02:56 +02:00
bleskes	5d832374dd	Update Documentation Feature Flags [1.1.0]	2014-03-25 17:51:30 +01:00
markharwood	5f1d9af9fe	Documentation fix for significant_terms heading levels	2014-03-17 12:17:54 +00:00
Boaz Leskes	ee8743f3f2	[Docs] added a missing reference to significantterms-aggergations Also fix header level mismatch issue reported by the build	2014-03-17 11:45:55 +01:00
markharwood	767bef0596	Significant_terms aggregation identifies terms that are significant rather than merely popular in a set. Significance is related to the changes in document frequency observed between everyday use in the corpus and frequency observed in the result set. The asciidocs include extensive details on the applications of this feature. Closes #5146	2014-03-14 10:34:24 +00:00

11 Commits