OpenSearch

Commit Graph

Author	SHA1	Message	Date
Colin Goodheart-Smithe	b9f4d44b14	Aggregations: Adds GeoBounds Aggregation The GeoBounds Aggregation is a new single bucket aggregation which outputs the coordinates of a bounding box containing all the points from all the documents passed to the aggregation as well as the doc count. Geobound Aggregation also use a wrap_logitude parameter which specifies whether the resulting bounding box is permitted to overlap the international date line. This option defaults to true. This aggregation introduces the idea of MetricsAggregation which do not return double values and cannot be used for sorting. The existing MetricsAggregation has been renamed to NumericMetricsAggregation and is a subclass of MetricsAggregation. MetricsAggregations do not store doc counts and do not support child aggregations. Closes #5634	2014-06-03 15:59:56 +01:00
javanna	5a1ad7b42e	[DOCS] fixed curl requests in benchmark docs	2014-06-03 11:47:13 +02:00
leonardo menezes	f3eca05c3b	[DOCS] removed slowest on single query benchmark requests Relates to #5904	2014-06-03 11:47:13 +02:00
Clinton Gormley	7fff6f1f43	Docs: Tidied percolate.asciidoc	2014-05-30 11:56:06 +02:00
Martijn van Groningen	aab38fb2e6	Aggregations: added pagination support to `top_hits` aggregation by adding `from` option. Closes #6299	2014-05-30 11:45:31 +02:00
Martijn van Groningen	5fafd2451a	Added `top_hits` aggregation that keeps track of the most relevant document being aggregated per bucket. Closes #6124	2014-05-23 16:01:18 +02:00
Nik Everett	3573822b7e	Highlight fields in request order Because json objects are unordered this also adds an explicit order syntax that looks like "highlight": { "fields": [ {"title":{ /params/ }}, {"text":{ /params/ }} ] } This is not useful for any of the builtin highlighters but will be useful in plugins. Closes #4649	2014-05-22 16:44:14 +02:00
Simon Willnauer	9d5507047f	Update Documentation Feature Flags [1.2.0]	2014-05-22 15:06:42 +02:00
Clinton Gormley	f950344546	[DOCS] Fixed title levels in context suggester	2014-05-21 20:47:25 +02:00
Simon Willnauer	ec3b1c57ac	Move Benchmark release to 1.3	2014-05-21 10:17:59 +02:00
Britta Weber	08e57890f8	use shard_min_doc_count also in TermsAggregation This was discussed in issue #6041 and #5998 . closes #6143	2014-05-14 14:10:04 +02:00
Clinton Gormley	ff12585fea	Improved wording in search-type.asciidoc Closes #5951	2014-05-14 12:15:48 +02:00
David Pilato	1cb2c3bdd3	[DOCS] reverse-nested aggs are added in 1.2.0	2014-05-13 20:00:42 +02:00
Tiago Alves Macambira	a8242e6c8c	Clarify `missing` behavior.	2014-05-13 15:49:46 +02:00
Adrien Grand	cc530b9037	Use t-digest as a dependency. Our improvements to t-digest have been pushed upstream and t-digest also got some additional nice improvements around memory usage and speedups of quantile estimation. So it makes sense to use it as a dependency now. This also allows to remove the test dependency on Apache Mahout. Close #6142	2014-05-13 10:38:08 +02:00
Clinton Gormley	3aac594503	[DOCS] Fix typos in context suggest	2014-05-13 10:34:16 +02:00
markharwood	1e560b0d92	Significant_terms agg: added option for a background_filter to define background context for analysis of term frequencies Closes #5944	2014-05-13 09:10:30 +01:00
Clinton Gormley	5b93255ec8	[DOCS] Added "Aggregation" to all aggs titles	2014-05-13 01:35:58 +02:00
Rashid Khan	233aaa63c9	Change key to keyed	2014-05-12 13:15:07 -07:00
Alex Ksikes	dae48d9fe8	Added the ability to include the queried document for More Like This API. By default More Like This API excludes the queried document from the response. However, when debugging or when comparing scores across different queries, it could be useful to have the best possible matched hit. So this option lets users explicitly specify the desired behavior. Closes #6067	2014-05-09 12:59:39 +02:00
Alex Ksikes	48b7172ee7	Provided some insights as to how More Like This works internally. In the Google Groups forum there appears to be some confusion as to what mlt does. This documentation update should hopefully help demystifying this feature, and provide some understanding as to how to use its parameters. Closes #6092	2014-05-09 12:13:29 +02:00
Andrew Selden	f23274523a	Integration tests for benchmark API. - Randomized integration tests for the benchmark API. - Negative tests for cases where the cluster cannot run benchmarks. - Return 404 on missing benchmark name. - Allow to specify 'types' as an array in the JSON syntax when describing a benchmark competition. - Don't record slowest for single-request competitions. Closes #6003, #5906, #5903, #5904	2014-05-07 14:14:54 -07:00
uboness	fc52db1209	Changed the respnose structure of the percentiles aggregation where now all the percentiles are placed under a `values` object (or `values` array in case the `keyed` flag is set to `false` Closes #5870	2014-05-07 18:35:24 +02:00
Britta Weber	7944369fd1	Add `shard_min_doc_count` parameter for significant terms similar to `shard_size` Significant terms internally maintain a priority queue per shard with a size potentially lower than the number of terms. This queue uses the score as criterion to determine if a bucket is kept or not. If many terms with low subsetDF score very high but the `min_doc_count` is set high, this might result in no terms being returned because the pq is filled with low frequent terms which are all sorted out in the end. This can be avoided by increasing the `shard_size` parameter to a higher value. However, it is not immediately clear to which value this parameter must be set because we can not know how many terms with low frequency are scored higher that the high frequent terms that we are actually interested in. On the other hand, if there is no routing of docs to shards involved, we can maybe assume that the documents of classes and also the terms therein are distributed evenly across shards. In that case it might be easier to not add documents to the pq that have subsetDF <= `shard_min_doc_count` which can be set to something like `min_doc_count`/number of shards because we would assume that even when summing up the subsetDF across shards `min_doc_count` will not be reached. closes #5998 closes #6041	2014-05-07 18:02:56 +02:00
gabriel-tessier	7b0efcbd96	fix typo	2014-05-06 15:54:36 +02:00
Audrey	52d2f2d229	[DOCS] Update phrase-suggest.asciidoc Grammatical error Close #5993	2014-05-06 10:28:13 +02:00
Martijn van Groningen	013b319415	Added `reverse_nested` aggregation. The `reverse_nested` aggregation allows to aggregate on properties outside of the nested scope of a `nested` aggregation. Closes #5507	2014-05-01 00:23:05 +07:00
Lee Hinman	57bee03193	[DOCS] Add /_search_shards documentation	2014-04-22 08:54:32 -06:00
Clinton Gormley	3ba8fbbef8	Update benchmark.asciidoc Fixed incorrect parameter spec for benchmark nodes	2014-04-22 14:16:10 +02:00
Clinton Gormley	0e782331be	Update benchmark.asciidoc	2014-04-21 20:39:33 +02:00
David Pilato	f3fe50aac4	[DOCS] fix typo	2014-04-19 22:44:44 +02:00
Scott Wilkerson	9ea0e3a95b	Update percolate.asciidoc fix typo	2014-04-15 16:01:44 +02:00
Andrew Selden	2cf66c4115	Benchmark documentation Moving benchmark documentation under the search section. Closes #5786	2014-04-14 14:08:41 -07:00
Malte Schirnacher	8ce3bba010	Fix typos in percolate.asciidoc Close #5762 #5763 #5764	2014-04-11 18:09:16 +02:00
Andrew O'Brien	48031b6236	Fixes typo in "Scan" search type documention	2014-04-07 16:01:37 -06:00
gabriel-tessier	000c33aac3	fix typo	2014-04-07 09:23:46 +02:00
Martijn van Groningen	ade1d0ef57	Added global ordinals (unique incremental numbering for terms) to fielddata. Added a terms aggregation implementations that work on global ordinals, which is also the default. Closes #5672	2014-04-07 11:06:41 +07:00
Karl Meisterheim	6d993bc810	[DOCS] A few grammar and word use corrections	2014-04-04 19:26:38 +02:00
Alexander Reelsen	e547e113e1	Geo context suggester: Require precision in mapping The default precision was way too exact and could lead people to think that geo context suggestions are not working. This patch now requires you to set the precision in the mapping, as elasticsearch itself can never tell exactly, what the required precision for the users suggestions are. Closes #5621	2014-04-02 23:51:14 +02:00
Hannes Korte	c11293ad78	Fix some typos in documentation.	2014-03-31 13:48:17 +02:00
bleskes	5d832374dd	Update Documentation Feature Flags [1.1.0]	2014-03-25 17:51:30 +01:00
Boaz Leskes	fc8dc3f733	[Docs] updated the search template and query template docs	2014-03-25 15:25:02 +01:00
Alexander Reelsen	4fc461a97c	[DOCS] Moved the template query documentation into search section	2014-03-25 10:01:41 +01:00
Simon Willnauer	b4e504df99	[Docs] Add coming tag for context suggester docs	2014-03-25 09:46:49 +01:00
uboness	7d6ad8d91c	Added extended_bounds support for date_/histogram aggs By default the date_/histogram returns all the buckets within the range of the data itself, that is, the documents with the smallest values (on which with histogram) will determine the min bucket (the bucket with the smallest key) and the documents with the highest values will determine the max bucket (the bucket with the highest key). Often, when when requesting empty buckets (min_doc_count : 0), this causes a confusion, specifically, when the data is also filtered. To understand why, let's look at an example: Lets say the you're filtering your request to get all docs from the last month, and in the date_histogram aggs you'd like to slice the data per day. You also specify min_doc_count:0 so that you'd still get empty buckets for those days to which no document belongs. By default, if the first document that fall in this last month also happen to fall on the first day of the second week of the month, the date_histogram will not return empty buckets for all those days prior to that second week. The reason for that is that by default the histogram aggregations only start building buckets when they encounter documents (hence, missing on all the days of the first week in our example). With extended_bounds, you now can "force" the histogram aggregations to start building buckets on a specific min values and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if the min_doc_count is greater than 0). Note that (as the name suggest) extended_bounds is not filtering buckets. Meaning, if the min bounds is higher than the values extracted from the documents, the documents will still dictate what the min bucket will be (and the same goes to the extended_bounds.max and the max bucket). For filtering buckets, one should nest the histogram agg under a range filter agg with the appropriate min/max. Closes #5224	2014-03-20 14:48:27 +01:00
markharwood	5f1d9af9fe	Documentation fix for significant_terms heading levels	2014-03-17 12:17:54 +00:00
Randy Stauner	1486188a3b	[DOCS] Reword clear-scroll sentence	2014-03-17 12:08:49 +01:00
Boaz Leskes	ee8743f3f2	[Docs] added a missing reference to significantterms-aggergations Also fix header level mismatch issue reported by the build	2014-03-17 11:45:55 +01:00
rphadake	36a0cb99d7	[Doc] doc updates for date histogram interval Close #5308	2014-03-14 18:55:32 +01:00
Adrien Grand	eef71da650	[Doc] Add a chart about the relative error of the percentiles aggregation.	2014-03-14 12:23:23 +01:00
markharwood	767bef0596	Significant_terms aggregation identifies terms that are significant rather than merely popular in a set. Significance is related to the changes in document frequency observed between everyday use in the corpus and frequency observed in the result set. The asciidocs include extensive details on the applications of this feature. Closes #5146	2014-03-14 10:34:24 +00:00
Adrien Grand	5821fa042c	Cardinality aggregation. This aggregation computes unique term counts using the hyperloglog++ algorithm which uses linear counting to estimate low cardinalities and hyperloglog on higher cardinalities. Since this algorithm works on hashes, it is useful for high-cardinality fields to store the hash of values directly in the index, which is the purpose of the new `murmur3` field type. This is less necessary on low-cardinality string fields because the aggregator is smart enough to only compute the hash once per unique value per segment thanks to ordinals, or on numeric fields since hashing them is very fast. Close #5426	2014-03-13 19:19:56 +01:00
Florian Schilling	81e537bd5e	ContextSuggester ================ This commit extends the `CompletionSuggester` by context informations. In example such a context informations can be a simple string representing a category reducing the suggestions in order to this category. Three base implementations of these context informations have been setup in this commit. - a Category Context - a Geo Context All the mapping for these context informations are specified within a context field in the completion field that should use this kind of information.	2014-03-13 11:24:46 +01:00
Kurt Hurtado	ca6a2bb790	[DOCS] Various aggregation doc fixes	2014-03-13 09:05:25 +01:00
Boaz Leskes	b7a95d11a7	Introduced VersionType.FORCE & VersionType.EXTERNAL_GTE Also added "external_gt" as an alias name for VersionType.EXTERNAL , accessible for the rest layer. Closes #4213 , Closes #2946	2014-03-10 21:07:17 +01:00
Simon Willnauer	fbb8c0fafa	[DOCS] Add `coming` tag to multiple rescores Closes #5365	2014-03-10 09:27:44 +01:00
Benjamin Devèze	2affa5004f	Fix small typo in percentiles doc	2014-03-07 10:10:19 +01:00
Adrien Grand	f359b7f38b	[DOC] The percentiles aggregation is coming in 1.1.0.	2014-03-07 10:03:15 +01:00
uboness	9d0fc76f54	Added support for sorting buckets based on sub aggregations Supports sorting on sub-aggs down the current hierarchy. This is supported as long as the aggregation in the specified order path are of a single-bucket type, where the last aggregation in the path points to either a single-bucket aggregation or a metrics one. If it's a single-bucket aggregation, the sort will be applied on the document count in the bucket (i.e. doc_count), and if it is a metrics type, the sort will be applied on the pointed out metric (in case of a single-metric aggregations, such as avg, the sort will be applied on the single metric value) NOTE: this commit adds a constraint on what should be considered a valid aggregation name. Aggregations names must be alpha-numeric and may contain '-' and '_'. Closes #5253	2014-03-06 00:05:27 +01:00
Zachary Tong	7b16c5857d	Percentiles aggregation. A new metric aggregation that can compute approximate values of arbitrary percentiles. Close #5323	2014-03-03 18:06:14 +01:00
Binh Ly	7e49848697	Clarify range aggregations	2014-02-28 14:38:57 -05:00
Clinton Gormley	53ce0e8e27	[DOCS] Fixed added[] tag version number	2014-02-28 15:29:43 +01:00
Luca Cavanna	4e6610a798	Fixed multi term queries support in postings highlighter for non top-level queries In #4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query. The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction. Closes #5102	2014-02-21 21:43:40 +01:00
Britta Weber	db3c6c2a8e	Enable percolation for nested documents closes #5082	2014-02-14 22:42:33 +01:00
uboness	d335630e57	[docs] fixed errors in aggs docs - error in nested aggs example - error in terms aggs example	2014-02-13 20:36:02 +01:00
Luca Cavanna	179750f0f5	[DOCS] fixed count docs, it now requires a top-level query object, same as other apis Relates to #4074	2014-02-13 13:36:20 +01:00
Luca Cavanna	01abea5945	[DOCS] fixed count and validate query docs, they now require a top-level query object, same as other apis Relates to #4074 Closes #5111	2014-02-13 11:42:04 +01:00
Simon Willnauer	990ce658a4	[Docs] Remove `custom_score` from documentation and add a migration section.	2014-02-11 14:59:15 +01:00
Clinton Gormley	93930d6dc7	Removed 0.90.* deprecation and addition notifications Closes #5052	2014-02-07 20:52:49 +01:00
Adrien Grand	9cb17408cb	Make size=0 return all buckets for the geohash_grid aggregation. Close #4875	2014-02-07 09:55:10 +01:00
Boaz Leskes	9bf263c741	[DOCS] Fix terms agg value script example	2014-02-06 16:35:49 +01:00
Boaz Leskes	ae4ed29f9b	[Docs] value_count supports script per 1.1	2014-02-06 15:04:50 +01:00
Clinton Gormley	6238d406b5	[DOCS] Removed the experimental label from Tribe, Hot Threads and Completion Suggester	2014-02-06 14:19:17 +01:00
Adrien Grand	6777be60ce	Add script support to value_count aggregations. Close #5001	2014-02-04 14:29:32 +01:00
Clinton Gormley	238b26a466	[DOC] Tidied up geohashgrid aggregations	2014-02-04 11:54:32 +01:00
Jun Ohtani	ba415b8ad2	Does not support "script" in value_clunt aggregation.	2014-02-04 10:26:07 +01:00
Adrien Grand	cc1ff560df	Rename `geohashgrid` to `geohash_grid` in documentation. It was renamed in `fc6bc4c477`. Close #4997	2014-02-04 09:39:55 +01:00
Lars Francke	1bd9dc129b	Fix confusing sentence The original sentence didn't make much sense. I hope this is a bit better. Taken heavy inspiration from `c63d8c4fb5`	2014-02-03 17:20:40 +01:00
Lars Francke	7cbd0962b5	Improve Aggregations documentation * Mostly minor things like typos and grammar stuff * Some clarifications * The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated	2014-02-03 17:16:52 +01:00
uboness	d3f2173ef9	fixed date_/histogram aggregation documentation - added documentation for the `min_doc_count` setting Closes #4944	2014-01-29 20:55:26 +01:00
uboness	9f04e5fe38	fixed nested example response in docs Closes #4935	2014-01-29 13:09:12 +01:00
uboness	dd389d1cc5	Made all multi-bucket aggs return consistent response format Closes #4926	2014-01-28 17:46:57 +01:00
Nik Everett	93a8e80aff	Support multiple rescores Detects if rescores arrive as an array instead of a plain object. If so then parse each element of the array as a separate rescore to be executed one after another. It looks like this: "rescore" : [ { "window_size" : 100, "query" : { "rescore_query" : { "match" : { "field1" : { "query" : "the quick brown", "type" : "phrase", "slop" : 2 } } }, "query_weight" : 0.7, "rescore_query_weight" : 1.2 } }, { "window_size" : 10, "query" : { "score_mode": "multiply", "rescore_query" : { "function_score" : { "script_score": { "script": "log10(doc['numeric'].value + 2)" } } } } } ] Rescores as a single object are still supported. Closes #4748	2014-01-23 16:29:07 +01:00
Nik Everett	37f80c8d80	Documentation for score_mode Closes #4742	2014-01-23 16:24:48 +01:00
Clinton Gormley	8685818ad3	[DOCS] Moved termvector and mtermvectors from search to docs	2014-01-22 14:10:26 +01:00
Simon Willnauer	cb3bcb05be	[DOCS]: Fix added version termvectors.asciidoc	2014-01-22 12:08:13 +01:00
Adrien Grand	9282ae4ffd	Terms aggregations: make size=0 return all terms. Terms aggregations return up to `size` terms, so up to now, the way to get all matching terms back was to set `size` to an arbitrary high number that would be larger than the number of unique terms. Terms aggregators already made sure to not allocate memory based on the `size` parameter so this commit mostly consists in making `0` an alias for the maximum integer value in the TermsParser. Close #4837	2014-01-22 11:05:10 +01:00
Lee Hinman	2c289fb538	Add the ability to retrieve fields from field data Adds a new FetchSubPhase, FieldDataFieldsFetchSubPhase, which loads the field data cache for a field and returns an array of values for the field. Also removes `doc['<field>']` and `_source.<field>` workaround no longer needed in field name resolving. Closes #4492	2014-01-21 09:13:32 -07:00
Martijn van Groningen	9bc3d996ff	[SPECS] Updated percolator specs.	2014-01-20 18:18:27 +01:00
Florian Gilcher	eed079aaac	Reference docs fixes * Make it clearer that `aggs` is an allowed synomym for the `aggregations` key * Fix broken example in for datehistogram, `1.5M` is not an allowed interval * Make use of colon before examples consistent * Fix typos	2014-01-20 12:14:17 +01:00
Dawid Weiss	ae71b25145	Documentation typo.	2014-01-20 11:51:08 +01:00
Luca Cavanna	4126ae2631	[DOCS] updated json responses after #4310 and #4480 - Removed "ok": true from response examples - Added "created" flag to index response examples - Replaced exists flag with found in delete response examples	2014-01-16 12:01:39 +01:00
markharwood	2795f4e55d	Standardized use of “_length” for parameter names rather than “_len”. Java Builder apis drop old “len” methods in favour of new “length” Rest APIs support both old “len: and new “length” forms using new ParseField class to a) provide compiler-checked consistency between Builder and Parser classes and b) a common means of handling deprecated syntax in the DSL. Documentation and rest specs only document the new “*length” forms Closes #4083	2014-01-13 15:59:15 +00:00
Adrien Grand	5c237fe834	Add new option `min_doc_count` to terms and histogram aggregations. `min_doc_count` is the minimum number of hits that a term or histogram key should match in order to appear in the response. `min_doc_count=0` replaces `compute_empty_buckets` for histograms and will behave exactly like facets' `all_terms=true` for terms aggregations. Close #4662	2014-01-13 10:09:38 +01:00
Martijn van Groningen	943b62634c	Replaced the multi-field type in favour for the multi fields option that can be set on any core field. When upgrading to ES 1.0 the existing mappings with a multi-field type automatically get replaced to a core field with the new `fields` option. If a `multi_field` type-ed field doesn't have a main / default field, a default field will be chosen for the multi fields syntax. The new main field type will be equal to the first `multi_field` fields' field or type string if no fields have been configured for the `multi_field` field and in both cases the default index will not be indexed (`index=no` is set on the default field). If a `multi_field` typed field has a default field, that field will replace the `multi_field` typed field. Closes to #4521	2014-01-13 09:21:53 +01:00
Florian Schilling	464037e0c1	Geo clean Up ============ The default unit for measuring distances is MILES in most cases. This commit moves ES over to the International System of Units and make it work on a default which relates to METERS . Also the current structures of the `GeoBoundingBox Filter` changed in order to define the Bounding by setting abitrary corners. Distances --------- Since the default unit for measuring distances has changed to a default unit `DistanceUnit.DEFAULT` relating to meters, the REST API has changed at the following places: * `ScriptDocValues.factorDistance()` returns meters instead of miles * `ScriptDocValues.factorDistanceWithDefault()` returns meters instead of miles * `ScriptDocValues.arcDistance()` returns meters instead of miles one might use `ScriptDocValues.arcDistanceInMiles()` * `ScriptDocValues.arcDistanceWithDefault()` returns meters instead of miles * `ScriptDocValues.distance()` returns meters instead of miles one might use `ScriptDocValues.distanceInMiles()` * `ScriptDocValues.distanceWithDefault()` returns meters instead of miles one might use `ScriptDocValues.distanceInMilesWithDefault()` * `GeoDistanceFilter` default unit changes from kilometers to meters * `GeoDistanceRangeFilter` default unit changes from miles to meters * `GeoDistanceFacet` default unit changes from miles to meters Geo Bounding Box Filter ----------------------- The naming of the GeoBoundingBoxFilter properties allows to set arbitrary corners (see #4084) namely `top_right`, `top_left`, `bottom_right` and `bottom_left`. This change also includes the fields `topRight` and `bottomLeft` Also it is be possible to set the single values by using just `top`, `bottom`, `left` and `right` parameters. Closes #4515, #4084	2014-01-11 21:30:29 +09:00
Simon Willnauer	bc5a9ca342	Rename edit_distance/min_similarity to fuzziness A lot of different API's currently use different names for the same logical parameter. Since lucene moved away from the notion of a `similarity` and now uses an `fuzziness` we should generalize this and encapsulate the generation, parsing and creation of these settings across all queries. This commit adds a new `Fuzziness` class that handles the renaming and generalization in a backwards compatible manner. This commit also added a ParseField class to better support deprecated Query DSL parameters The ParseField class allows specifying parameger that have been deprecated. Those parameters can be more easily tracked and removed in future version. This also allows to run queries in `strict` mode per index to throw exceptions if a query is executed with deprected keys. Closes #4082	2014-01-09 15:14:51 +01:00
Martijn van Groningen	7e341cefd0	Change the `sort` boolean option in percolate api to the sort dsl available in search api. Closes #4625	2014-01-09 09:58:34 +01:00
Clinton Gormley	2e4b70d40f	[DOCS] Fixed duplicate ID in highlighting	2014-01-09 00:37:18 +01:00
Nik Everett	bbf0ec52de	Add warning phrase suggester's max_errors large number can badly impact performance.	2014-01-08 23:06:41 +01:00
Nik Everett	8bd9e34e39	Stop FVH from throwing away some query boosts The FVH was throwing away some boosts on queries stopping a number of ways to boost phrase matches to the top of the list of fragments from working. The plain highlighter also doesn't work for this but that is because it doesn't support the concept of the same term having a different score at different positions. Also update documentation claiming that FHV is nicer for weighing terms found by query combinations. Closes #4351	2014-01-08 11:51:48 +01:00
Nik Everett	522d620eb6	Use FHV's phraseLimit This prevents poisoning the FVH with documents that contain TONS of matches which take tons of memory and time to highlight. Closes #4645	2014-01-08 11:27:58 +01:00
markharwood	602de04692	A GeoHashGrid aggregation that buckets GeoPoints into cells whose dimensions are determined by a choice of GeoHash resolution. Added a long-based representation of GeoHashes to GeoHashUtils for fast evaluation in aggregations. The new BucketUtils provides a common heuristic for determining the number of results to obtain from each shard in "top N" type requests.	2014-01-07 18:03:33 +00:00
Lee Hinman	2cb40fcb17	Rename "exists" to "found" in TermVector and Get responses - Adds the "created" field to the index action response - Reverses Delete class' notFound to Found to avoid double negative	2014-01-07 09:47:07 -07:00
Simon Willnauer	fa16969360	Cleanup comments and class names s/ElasticSearch/Elasticsearch * Clean up s/ElasticSearch/Elasticsearch on docs/* * Clean up s/ElasticSearch/Elasticsearch on src/* bin/* & pom.xml * Clean up s/ElasticSearch/Elasticsearch on NOTICE.txt and README.textile Closes #4634	2014-01-07 11:21:51 +01:00
Martijn van Groningen	32c5471d33	Rename `score` to `track_scores` in percolate api. Closes #4624	2014-01-06 14:57:39 +01:00
Martijn van Groningen	f1bf585089	The `fields` option should always return an array for json document fields and single valued field for metadata fields. Also the `fields` option can only be used to fetch leaf fields, trying to do fetch object fields will return in a client error. Closes #4542	2014-01-03 17:29:12 +01:00
Martijn van Groningen	f4bf0d5112	Replaced `ignore_indices` with `ignore_unavailable`, `expand_wildcards` and `allow_no_indices`. * `ignore_unavailable` - Controls whether to ignore if any specified indices are unavailable, this includes indices that don't exist or closed indices. Either `true` or `false` can be specified. * `allow_no_indices` - Controls whether to fail if a wildcard indices expressions results into no concrete indices. Either `true` or `false` can be specified. For example if the wildcard expression `foo` is specified and no indices are available that start with `foo` then depending on this setting the request will fail. This setting is also applicable when `_all`, `` or no index has been specified. * `expand_wildcards` - Controls to what kind of concrete indices wildcard indices expression expand to. If `open` is specified then the wildcard expression if expanded to only open indices and if `closed` is specified then the wildcard expression if expanded only to closed indices. Also both values (`open,closed`) can be specified to expand to all indices. Closes to #4436	2014-01-02 12:19:45 +01:00
Florian Schilling	bc452dff84	* setup accurate GeoDistance Function * adapt tests * introduced default GeoDistance function * Updated docs closes #4498	2013-12-27 19:15:19 +09:00
Chris Simpson	4f8c916eed	[Docs] Fix Typo Fixes small typo in the geo_distance aggregation docs.	2013-12-18 11:21:21 +01:00
Clinton Gormley	34b9b16233	[DOCS] Fixed some bad link refs	2013-12-16 18:07:33 +01:00
Martijn van Groningen	23d2b1ea7b	Renamed top level `filter` to `post_filter`. Closes #4119	2013-12-16 17:10:14 +01:00
Adrien Grand	36bd9cc432	Aggregations: Ordinals-based string bucketing support. When the ValuesSource has ordinals, terms ordinals are used as a cache key to bucket ordinals. This can make terms aggregations on String terms significantly faster. Close #4350	2013-12-13 15:34:02 +01:00
Martijn van Groningen	10e2528cce	Added the `force_source` option to highlighting that enforces to use of the _source even if there are stored fields. The percolator uses this option to deal with the fact that the MemoryIndex doesn't support stored fields, this is possible b/c the _source of the document being percolated is always present. Closes #4348	2013-12-13 13:39:53 +01:00
Martijn van Groningen	ebf6519965	Added aggs option to percolate api documentation.	2013-12-10 14:09:37 +01:00
Martijn van Groningen	8c1de501e7	Update percolator highlighting docs.	2013-12-07 16:40:49 -05:00
Adrien Grand	32eb5ffa92	[Docs] Document which encoding should be used in order to make sense of the offsets returned by the term vectors API. Close #4363	2013-12-06 22:39:08 +01:00
Markus Fischer	2da0611dfb	[DOCS] Completion suggest: Clarify de-duplication, optimize/merge This contribution is based on the feedback given in issue #4254 and issue #4255, and should clear things up, when suggestions are being removed and not displayed anymore after deletion of data.	2013-12-05 11:10:56 +01:00
Nik Everett	8e34057bc0	Add support for combining fields to the FVH The Fast Vector Highlighter can combine matches on multiple fields to highlight a single field using `matched_fields`. This is most intuitive for multifields that analyze the same string in different ways. Example: { "query": { "query_string": { "query": "content.plain:running scissors", "fields": ["content"] } }, "highlight": { "order": "score", "fields": { "content": { "matched_fields": ["content", "content.plain"], "type" : "fvh" } } } } Closes #3750	2013-12-03 11:10:01 +01:00
Clinton Gormley	d9a480c97a	[DOCS] Typos in aggregations	2013-12-02 15:14:25 +01:00
Conrad Pankoff	87246af256	[DOCS] Fixed typos and corrected grammar	2013-12-02 10:08:26 +01:00
uboness	cdc7dfbb2c	Changed the "script_lang" parameter to "lang" in all value source based aggs - to be consistent with all other script based APIs.	2013-12-02 02:01:03 +01:00
Clinton Gormley	bc393b6d79	Changed the minScore comparator from > to >= Closes #4303	2013-11-29 20:29:20 +01:00
uboness	0d6a35b9a7	- Added support for term filtering based on include/exclude regex on the terms agg - Added javadoc to the TermsBuilder Closes #4267	2013-11-29 13:46:48 +01:00
uboness	afb0d119e4	- Added docs for the value_count aggregation - Fixed typos in the terms facets docs - Fixed aggregation docs layout - Added docs for shard_size in term aggregation	2013-11-29 12:35:42 +01:00
Clinton Gormley	cdc1935b6e	[DOCS] Documented rest.action.multi.allow_explicit_index	2013-11-27 17:33:09 +01:00
Boaz Leskes	c63d8c4fb5	[Docs] Added _source filtering to documentation Relates to #3301	2013-11-26 19:16:24 +01:00
Britta Weber	dbef64009f	[DOC] add doc for multi term vector api closes #3998	2013-11-26 17:03:14 +01:00
Alexander Reelsen	bf74f49fdd	Updated Analyzing/Fuzzysuggester from lucene trunk * Minor alignments (like setter to ctor) * FuzzySuggester has a unicode aware flag, which is not exposed in the fuzzy completion request parameters * Made XAnalyzingSuggester flags (PAYLOAD_SEP, END_BYTE, SEP_LABEL) to be written into the postings format, so we can retain backwards compatibility * The above change also implies, that these flags can be set per instantiated XAnalyzingSuggester * CompletionPostingsFormatTest now uses a randomProvider for writing data to check for bwc	2013-11-26 12:52:06 +01:00
uboness	c7f6c5266d	initial commit of the aggregations module Closes #3300	2013-11-24 03:13:08 -08:00
Clinton Gormley	3465e69e83	[DOCS] Changed all store:yes/no to store:true/false which is how this setting is stored internally	2013-11-07 16:57:18 +01:00
Simon Willnauer	77bc5d5ecf	release [1.0.0.Beta1]	2013-11-06 15:32:43 +01:00
Shay Banon	7c32269f4f	Dist. Percolation: Use .percolator instead of _percolator for type name Use .percolator as the internal (hidden) type name for percolators within the index. Seems nicer name to represent "hidden" types within an index. closes #4090	2013-11-05 20:02:59 +01:00
Martijn van Groningen	30ab6f841d	[DOCS] Fixed percolate docs errors	2013-11-01 11:44:07 +01:00
Clinton Gormley	8b2efd4849	[DOCS] Added a version flag to percolation	2013-10-30 13:59:03 +01:00
Luca Cavanna	48ac9747a8	Added third highlighter type based on lucene postings highlighter Requires field index_options set to "offsets" in order to store positions and offsets in the postings list. Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be. Requires less disk space than term_vectors, needed for the fast_vector_highlighter. Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance. Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm. Uses forked version of lucene postings highlighter to support: - per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value - manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same The lucene postings highlighter api is quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO. The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents. Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding. Closes #3704	2013-10-24 23:38:00 +02:00
Luca Cavanna	e981e411d7	[DOCS] rephrased docs for highlight no_match_size parameter (removed 0.90.6 coming tag as it's needed only in 0.90 branch)	2013-10-24 14:38:32 +02:00
Nik Everett	14a709f563	Highlighting can return excerpt with no highlights You can configure the highlighting api to return an excerpt of a field even if there wasn't a match on the field. The FVH makes excerpts from the beginning of the string to the first boundary character after the requested length or the boundary_max_scan, whichever comes first. The Plain highlighter makes excerpts from the beginning of the string to the end of the last token before the requested length. Closes #1171	2013-10-24 14:38:32 +02:00
Markus Fischer	782d315da3	Fix markup	2013-10-21 16:11:09 +02:00
Clinton Gormley	b2d82d7e75	[DOCS] Reorganised the highlight_query docs and added a version flag	2013-10-18 18:03:31 +02:00
Nik Everett	60550e4cc2	phrase_len is not called phrase_length	2013-10-18 09:29:53 -04:00
Martijn van Groningen	1d0841e2b8	Added initial documentation for the redesigned percolator.	2013-10-16 14:12:19 +02:00
Clinton Gormley	9a062e465c	[DOCS] Reorganised common API conventions	2013-10-13 16:46:56 +02:00
Clinton Gormley	ba1b4886e3	[DOCS] Moved "named filters/queries" up one level	2013-10-10 11:23:08 +02:00
Clinton Gormley	264a00a40f	[DOCS] Added pages explaining lucene query parser syntax and regular expression syntax	2013-10-07 14:42:49 +02:00
Clinton Gormley	7a53d41446	[DOCS] Changed capitalization of operator in rescore query	2013-10-05 17:18:15 +02:00
uboness	f3c6108b71	introduced support for "shard_size" for terms & terms_stats facets. The "shard_size" is the number of term entries each shard will send back to the coordinating node. "shard_size" > "size" will increase the accuracy (both in terms of the counts associated with each term and the terms that will actually be returned the user) - of course, the higher "shard_size" is, the more expensive the processing becomes as bigger queues are maintained on a shard level and larger lists are streamed back from the shards. closes #3821	2013-10-02 22:02:00 +02:00
Nik Everett	6b000d8c6d	Support specifing score query on highlight. This is useful if you want to highlight terms not in the search query or you want sort highlighted snippets based on another query. Closes #3630	2013-10-02 15:46:24 -04:00
Lee Hinman	ba40aa374e	Uniquify anchor links to fix asciidoc/docbook generation	2013-09-30 15:32:00 -06:00
Lee Hinman	0442b737be	Add more anchor links to documentation Related to #3679	2013-09-30 13:13:16 -06:00

1 2 3 4 5 ...

266 Commits