OpenSearch

Commit Graph

Author	SHA1	Message	Date
Martijn van Groningen	bf21466291	CacheTests test fix.	2013-04-12 19:14:38 +02:00
Martijn van Groningen	80dbca0809	Field data: Try to load short values as byte values and load int values as short or byte values to reduce the size they take in memory.	2013-04-12 19:11:18 +02:00
Martijn van Groningen	5c90e5f940	If no options are specified with the clear cache api then all caches should be cleared. Closes #2886	2013-04-12 15:24:50 +02:00
Igor Motov	00c035f88c	Make sure that settings are propagated to all nodes	2013-04-11 10:59:14 -04:00
Martijn van Groningen	2dfcc3c740	Test that size is actually computed. Relates to #2882	2013-04-11 10:22:48 +02:00
Simon Willnauer	9a2d27a035	rename prefix_length to prefix_len for consistency Closes #2883	2013-04-10 17:39:32 +02:00
Martijn van Groningen	6a3c53ef44	Should prevent OOM	2013-04-10 10:00:51 +02:00
Martijn van Groningen	b8b28041e5	Fix for extended facets test.	2013-04-10 00:47:00 +02:00
Igor Motov	b0e44a2b40	Fix term counters in script field terms facet Fixes #2878	2013-04-09 12:42:35 -04:00
Simon Willnauer	ae74a8dbb7	Configure FieldData using a hash not a string Closes #2876	2013-04-09 15:53:05 +02:00
Simon Willnauer	374bbbfa7b	# FieldData Filter FieldData is an in-memory representation of the term dictionary in an uninverted form. Under certain circumstances this FieldData representation can grow very large on high-cardinality fields like tokenized full-text. Depending on the use-case filtering the terms that are hold in the FieldData representation can heavily improve execution performance and application stability. FieldData Filters can be applied on a per-segment basis. During FieldData loading the terms enumeration is passed through a filter predicate that either accepts or rejects a term. ## Frequency Filter The Frequency Filter acts as a high / low pass filter based on the document frequencies of a certain term within the segment that is loaded into field data. It allows to reject terms that are very high or low frequent based on absolute frequencies or percentages relative to the number of documents in the segment or more precise the number of document that have at least one value in the field that is loaded in the current segment. Here is an example mapping Here is an example mapping: ```json { "tweet" : { "properties" : { "locale" : { "type" : "string", "fielddata" : "format=paged_bytes;filter.frequency.min=0.001;filter.frequency.max=0.1", "index" : "analyzed", } } } } ``` ### Paramters * `filter.frequency.min` - the minimum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted. * `filter.frequency.max` - the maximum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted. * `filter.frequency.min_segment_size` - the minimum number of documents in a segment in order for the filter to be applied. Small segments might be omitted with this setting. ## Regular Expression Filter The regular expression filter applies a regular expression to each term during loading and only loads terms into memory that match the given regular expression. Here is an example mapping: ```json { "tweet" : { "properties" : { "locale" : { "type" : "string", "fielddata" : "format=paged_bytes;filter.regex=^en_.*", "index" : "analyzed", } } } } ``` Closes #2874	2013-04-09 11:34:48 +02:00
Simon Willnauer	a10c80e20f	ensure that modificatons to the enum order trigger test failures since we rely on the ordinal	2013-04-08 23:29:56 +02:00
Simon Willnauer	7e77ddb88f	use enum to represent flags and fail if flags are not respected	2013-04-08 22:56:11 +02:00
Shay Banon	15d7ae5983	FieldData Stats: Add field data stats to indices stats API closes #2870	2013-04-07 18:30:24 -07:00
Martijn van Groningen	86c1714bf3	Also test the `fields` option.	2013-04-07 21:52:19 +02:00
Shay Banon	84670212a6	Filter / Id Cache Stats: Add to Indices Stats API, revise node stats API closes #2862	2013-04-05 20:02:32 +02:00
Simon Willnauer	5e7ad9832c	Added more evil tests for different field data implementations	2013-04-05 18:12:50 +02:00
Martijn van Groningen	224faffead	Added an extended test for terms facet with a decent number of documents / field values and randomly tests various options. Also fixed an issue where `regex` and `excludes` were ignored when `all_terms` was used.	2013-04-05 17:38:46 +02:00
David Pilato	4b1ec037f8	Fix test for #2668 .	2013-04-05 15:00:28 +02:00
Martijn van Groningen	9b5c74d43e	Made sure `all_terms` works consistently. In some cases the `all_terms` option was ignored: * Faceting on number based fields. * The `execution_type` was set to `map`. * In the case the `fields` option was used. Closes #2861	2013-04-05 14:27:19 +02:00
David Pilato	36b92be212	List of existing plugins with Node Info API We want to display information about loaded plugins in Node Info API using plugin option: ```sh curl http://localhost:9200/_nodes?plugin=true ``` For example, on a 4 nodes cluster, it could provide the following output: ```javascript { "ok" : true, "cluster_name" : "test-cluster-MacBook-Air-de-David.local", "nodes" : { "lodYfbFTRnmwE6rjWGGyQQ" : { "name" : "node1", "transport_address" : "inet[/172.18.58.139:9300]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9200]", "plugins" : [ ] }, "hJLXmY_NTrCytiIMbX4_1g" : { "name" : "node4", "transport_address" : "inet[/172.18.58.139:9303]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9203]", "plugins" : [ { "name" : "test-plugin", "description" : "test-plugin description", "site" : true, "jvm" : false }, { "name" : "test-no-version-plugin", "description" : "test-no-version-plugin description", "site" : true, "jvm" : false }, { "name" : "dummy", "description" : "No description found for dummy.", "url" : "/_plugin/dummy/", "site" : false, "jvm" : true } ] }, "bnoySsBfTrSzbDRZ0BFHvg" : { "name" : "node2", "transport_address" : "inet[/172.18.58.139:9301]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9201]", "plugins" : [ { "name" : "dummy", "description" : "This is a description for a dummy test site plugin.", "url" : "/_plugin/dummy/", "site" : false, "jvm" : true } ] }, "0Vwil01LSfK9YgRrMce3Ug" : { "name" : "node3", "transport_address" : "inet[/172.18.58.139:9302]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9202]", "plugins" : [ { "name" : "test-plugin", "description" : "test-plugin description", "site" : true, "jvm" : false } ] } } } ``` Information are cached for 10 seconds by default. Modify `plugins.info_refresh_interval` property if needed. Setting `plugins.info_refresh_interval` to `-1` will cause infinite caching. Setting `plugins.info_refresh_interval` to `0` will disable caching. Closes #2668.	2013-04-05 11:36:56 +02:00
Simon Willnauer	f3e6fe094a	beef up term facet tests	2013-04-05 11:05:24 +02:00
Simon Willnauer	9fbe075aec	Added test that compares concurrent facet execution results with a serial execution result	2013-04-05 10:36:53 +02:00
Shay Banon	54f685674b	Thread Pool: Update default settings (move from default cached to fixed) closes #2858	2013-04-04 23:24:49 +02:00
Simon Willnauer	d758401add	Cleanup ScriptDocValues. This commit adds a getValues method to all ScriptDocValues for easy access in scripts via doc['field'].values / value.	2013-04-04 16:07:54 +02:00
Alexander Reelsen	4f96b36376	Returning configuration of root field mappers toXContent method only if they are enabled	2013-04-04 15:55:12 +02:00
Alexander Reelsen	955788e9a5	Allowing to disable size field mapper after enabling	2013-04-04 09:41:41 +02:00
Alexander Reelsen	e662e4d55d	Allowing to disable index field mapper after enabling	2013-04-04 09:41:41 +02:00
Alexander Reelsen	9cc2563d5e	Allowing to disable timestamp field mapper after enabling	2013-04-04 09:41:41 +02:00
Simon Willnauer	223ec2c42d	Beef up FieldData tests by running one on one duells	2013-04-03 18:38:25 +02:00
Igor Motov	356329df00	Improve stability of ClusterHealthTests	2013-04-03 12:07:42 -04:00
Igor Motov	d2f6349dcf	Improve stability of MinimumMasterNodesTests	2013-04-03 11:51:28 -04:00
Martijn van Groningen	0a89c80554	Fixed issue where a doc is omitted from the hits if it has no geo point and sorting is based on geo distance. Closes #2851	2013-04-03 17:25:16 +02:00
Simon Willnauer	eb8b38d027	Upgrade to Lucene 4.2.1	2013-04-03 12:22:39 +02:00
Martijn van Groningen	cf00acf5b0	If no specified index or alias exists and `ignore_indices` is set to `missing` an index missing error is returned instead of resolving to all open indices (e.g. when searching). This breaks backwards comp. with 0.20.x and before. Closes #2837	2013-04-02 19:06:17 +02:00
Alexander Reelsen	d866321c55	Merge pull request #2811 from spinscale/document-mapper-merge Allow to update ttl field mapping after initial creation. Fixes #2136	2013-04-01 23:37:29 -07:00
Simon Willnauer	7efa92636a	Cut over to IntsRef in favor of IntsArrayRef	2013-03-31 10:46:21 +02:00
Simon Willnauer	b3356d9f8d	remove dead code	2013-03-31 10:19:17 +02:00
Simon Willnauer	2a09342405	remove Bytes.java in favor of BytesRef / ArrayUtils	2013-03-31 08:54:39 +02:00
Simon Willnauer	fefa8da2ea	remove StringValues in favor of BytesValues	2013-03-30 17:35:23 +01:00
Simon Willnauer	dff2a9279c	clean-up double values	2013-03-30 17:35:23 +01:00
Simon Willnauer	d5c271acf5	clean-up long values	2013-03-30 17:35:22 +01:00
Simon Willnauer	5aedf74fb0	Remove getValues from numeric and string field data & clean up geo field data	2013-03-30 17:35:22 +01:00
Simon Willnauer	7f81469137	Refactor BytesValues to be reused as the interface for HashedBytesValues and remove HashBytesValues.	2013-03-30 17:35:22 +01:00
Simon Willnauer	129f02623b	Added FST based FieldData implementation holding all data in a per segment FST. This commit factors our a common API for BytesValues based impl to shared code and reduce code duplication.	2013-03-30 17:35:22 +01:00
Martijn van Groningen	a89dde8bac	Fixed `bool` filter bugs: * In the case only should clauses were specified with specific type of filters, the first clause determined which documents matched. * In some cases the minimum at least 1 should clause should match behaviour was broken.	2013-03-29 16:48:36 +01:00
Alexander Reelsen	a880a6c85e	Allow to update ttl field mapping after initial creation. Fixes #2136 Adding possibility to change TTL field mapper data without specifying enabled flag in mapping update	2013-03-28 17:25:28 +01:00
Martijn van Groningen	941aa17a43	Added sort mode to geo distance sorting. Closes #1846	2013-03-28 17:04:42 +01:00
Igor Motov	9bc50ea609	Fix LeastUsedDistributor and ensure random distribution for multiple non-fs directories If we cannot determine available space the fallback scenario is now to use random distribution instead of always using the last directory. Fixes #2820	2013-03-28 11:08:54 -04:00
Shay Banon	1fc37e5954	Segments API: Add version & compound for each segment closes #2823	2013-03-28 15:34:38 +01:00
Igor Motov	5bb75f9da3	Move applying alias filter to ContextSearch#preProcess()	2013-03-27 09:23:54 -04:00
Simon Willnauer	17f83f33bb	Terminate early when no terms left in the suggest string. Closes #2817	2013-03-26 17:44:34 +01:00
Igor Motov	9ae421a8b2	Fix filtering aliases with non-empty sort options Fixes #2816	2013-03-26 07:23:44 -04:00
Simon Willnauer	aa97c031f2	Don't reset tokenstream before passing to the MemoryIndex, otherwise some tokenizer might swallow tokens. Closes #2814	2013-03-25 22:46:11 +01:00
adavis	6a93fbcf07	Adding parsing for zero terms query for multi match Tests for multi-match zero_terms_query and making references to the ZeroTermsQuery enum consistent to others used in MultiMatchQueryBuilder	2013-03-23 08:59:39 +01:00
adavis	3f83904680	Fixes java6_u31 compile error w.r.t. type inference	2013-03-22 16:46:42 -07:00
Florian Schilling	1a67793a4b	Added Script test for geo distance tests and modified GeoUtils.normalizePoint()	2013-03-22 13:34:18 +01:00
Simon Willnauer	075779a397	Call onMissing if doc has no value in the field. Closes #2807	2013-03-21 22:45:17 +01:00
Simon Willnauer	064d272916	Respect offset and length when iterating over BytesRef in Uid. The length is starting at offset Closes #2806	2013-03-21 19:29:05 +01:00
Florian Schilling	f08d458545	# GeoShape Precision The `geo_shape` precision could be only set via `tree_levels` so far. A new option `precision` now allows to set the levels of the underlying tree structures to be set by distances like `50m`. The def ## Example ```json curl -XPUT 'http://127.0.0.1:9200/myindex/' -d '{ "mappings" : { "type1": { "dynamic": "false", "properties": { "location" : { "type" : "geo_shape", "geohash" : "true", "store" : "yes", "precision":"50m" } } } } }' ``` ## Changes - GeoUtils defines the [WGS84](http://en.wikipedia.org/wiki/WGS84) reference ellipsoid of earth - DistanceUnits refer to a more precise definition of earth circumference - DistanceUnits for inch, yard and meter have been defined - Set default levels in GeoShapeFieldMapper to 50m precision Closes #2803	2013-03-20 14:52:47 +01:00
Simon Willnauer	4705eb2959	Lazily initialize the delegate in BloomFilteredPostingsFormat to prevent unnecessary loading if bloomfilter terminates early	2013-03-20 12:43:17 +01:00
Simon Willnauer	747ce36915	Specialise the default codec to reuse Lucene41 files in the common case. Closes #2799	2013-03-20 12:43:17 +01:00
Shay Banon	7d9cef904b	Field Data: optimize long type to use narrowest possible type automatically closes #2795	2013-03-18 12:37:15 +01:00
Simon Willnauer	c25eb7defe	Fix bug in RateLimiter.SimpleRateLimiter causing numeric overflow in StoreStats Closes #2785	2013-03-15 23:36:31 +01:00
Florian Schilling	25bd9cecd0	# REST Suggester API The REST Suggester API binds the 'Suggest API' to the REST Layer directly. Hence there is no need to touch the query layer for requesting suggestions. This API extracts the Phrase Suggester API and makes 'suggestion request' top-level objects in suggestion requests. The complete API can be found in the underlying ["Suggest Feature API"](http://www.elasticsearch.org/guide/reference/api/search/suggest.html). # API Example The following examples show how Suggest Actions work on the REST layer. According to this a simple request and its response will be shown. ## Suggestion Request ```json curl -XPOST 'localhost:9200/_suggest?pretty=true' -d '{ "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "bigram", "field" : "bigram", "size" : 1, "real_word_error_likelihood" : 0.95, "max_errors" : 0.5, "gram_size" : 2 } } }' ``` This example shows how to query a suggestion for the global text 'Xor the Got-Jewel'. A 'simple phrase' suggestion is requested and a 'direct generator' is configured to generate the candidates. ## Suggestion Response On success the request above will reply with a response like the following: ```json { "simple_phrase" : [ { "text" : "Xor the Got-Jewel", "offset" : 0, "length" : 17, "options" : [ { "text" : "xorr the the got got jewel", "score" : 3.5283546E-4 } ] } ] } ``` The 'suggest'-response contains a single 'simple phrase' which contains an 'option' in turn. This option represents a suggestion of the queried text. It contains the corrected text and a score indicating the probability of this option to be meant. Closes #2774	2013-03-13 19:36:29 +01:00
Alexander Reelsen	125b33d3dc	GeoJSONShapeParser parses JSON correctly and extracts coordinates even if 'crs' field is included. Fixes #2763	2013-03-13 15:17:21 +01:00
Simon Willnauer	365cde82d3	Use numOrds rather than numDocs as upperbound for sorting Closes #2773	2013-03-13 15:13:56 +01:00
Igor Motov	3a534c64e5	Add dynamic settings validation Fixes #2749	2013-03-12 14:41:00 -04:00
Simon Willnauer	c5395436e6	fix test bug where a small time window exists that can trigger a false failure due to default concurrent recoveries	2013-03-12 14:48:14 +01:00
Simon Willnauer	237c4ddf54	Introdue ParentIdCollector that collects only if the parent ID is non null ie. if the document has a parent. Closes #2744	2013-03-11 21:11:05 +01:00
Simon Willnauer	9442c41481	enable testcase that relied on a Lucene 4.2 fix	2013-03-11 12:57:24 +01:00
Simon Willnauer	4e7cff488e	add test that ensures that we bumb the version on a Lucene Upgrade	2013-03-11 10:30:48 +01:00
Simon Willnauer	ebadd9ebbd	Fix tests since Lucene 4.2 we can support date math in Fuzzy-Search Syntax	2013-03-11 08:23:01 +01:00
Simon Willnauer	a37f1f55cc	Add tests for highlighting boost query. Closes #1314	2013-03-11 08:23:01 +01:00
Simon Willnauer	11bf7a8b1a	Upgrade to Lucene 4.2	2013-03-11 08:23:01 +01:00
Simon Willnauer	75fd6d4985	Added KeywordRepeatFilter that allows emitting stemmed and unstemmed versions of the same token if stemmers are used Closes #2753	2013-03-09 23:09:59 +01:00
Simon Willnauer	dc9a052287	Respect CandidateGenerator#size if set in the request and reduce the total #of candidates to the shard size. Closes #2752	2013-03-09 13:36:40 +01:00
Shay Banon	eb956e7c09	Term/Terms filters on numeric fields gives wrong result fixes #2746	2013-03-07 22:12:22 -08:00
Benjamin Devèze	35f5ca915d	Add support for ignore_indices to delete by query Closes #2734	2013-03-07 10:17:51 +01:00
Shay Banon	e1409a9f0e	Problems with range searches for time with lte fixes #2731	2013-03-05 18:10:30 -08:00
Igor Motov	acff102234	Implement search shards API Closes #2726	2013-03-05 09:17:59 -05:00
Simon Willnauer	1eb24d7efc	use a base ShingleFilterFactory to simplify default shingle detection	2013-03-05 12:32:50 +01:00
Simon Willnauer	876b5a3dcd	prefer totalTermFrequency over docFreq in PhraseSuggester	2013-03-05 10:46:25 +01:00
Simon Willnauer	315744be55	Set shardSize according to the total size if not explicitly specified. Closes #2729	2013-03-05 09:22:23 +01:00
Shay Banon	3e264f6b95	cleanup deletion of content in shards we are very conservative on when we delete data, remove the actual options of deleting data that are not used	2013-03-04 20:41:19 -08:00
Drew Raines	a8d52b58b6	Remove obsolete test.	2013-03-04 15:22:40 -06:00
Benjamin Devèze	09f20e3d4c	Fix bug when searching concrete and routing aliased indices Closes #2683	2013-03-03 14:31:57 +01:00
uboness	881cb7900c	Change geo_shapes support: * Exposed the spatial strategy to be configurable as part of the geo_shape mappings * Exposed the spatial strategy to be customizable at query time (will be used to generate the geo_shape filter/query) * Removed XTermQueryPrefixTreeStrategy and reverted to use the lucene TermQueryPrefixTreeStrategy instead * Made the RecursivePrefixTreeStrategy the default strategy to be used * Removed support for all spatial operations except "intersects" * Updated both the GeoShapeQueryBuilder and GeoShapeFilterBuilder with all the changes (removed the option of specifying the operation type (as only intersects is supported) and added the option of setting the filter/query spatial strategy Closes #2720	2013-03-02 17:13:58 +01:00
Shay Banon	50d121315b	add ability for cluster health to wait for current events to be processed help with tests that run on slow machines	2013-03-02 14:25:45 +01:00
Shay Banon	ea097afd91	add proper testing for bool filter	2013-03-02 01:07:05 +01:00
Shay Banon	361d6bf89a	spin a bit to wait for condition in test, so slow machines will still run it correctly	2013-03-01 23:36:13 +01:00
Shay Banon	9b68e98ea2	more strict check before trying to parse and detect a string as a date fixes #2694	2013-03-01 22:15:32 +01:00
Simon Willnauer	aaa3c48b3c	Throw IAE if indices is null or contains a null value. Closes #2656	2013-03-01 21:26:23 +01:00
Simon Willnauer	fced68c22d	ensure that suggestion only added on reduce if they are present in the shard response	2013-03-01 21:09:10 +01:00
Martijn van Groningen	d99b532f0f	Supporting sort modes `avg` and `sum` when sorting inside nested objects. Previously this commit either sort modes `min` or `max` (depending on sort order) was used when sort modes `avg` and `sum` were picked. Closes #2701	2013-03-01 19:53:20 +01:00
Simon Willnauer	39f362326e	Short Curcuit response if no indices exits and make sure listener is notified. Closes #2692	2013-03-01 15:15:56 +01:00
Simon Willnauer	3c1f291801	Fail in metadata parsing if the id path is not a value but rather an array or an object. Closes #2275	2013-03-01 13:00:29 +01:00
Shay Banon	30075bb6f9	add info in test for actual search failures	2013-03-01 00:00:09 +01:00
Shay Banon	849a3677cd	improve timing in test to wait for state with graceful timeouts (yet, validate early and exit when relevant)	2013-02-28 23:44:52 +01:00
Simon Willnauer	c90c5cbf85	fix bug in StupidBackoffScorer were previous word and current word were flipped creating non-existing bigram	2013-02-28 21:23:41 +01:00
Simon Willnauer	d4ec03ed76	# Phrase Suggester The `term` suggester provides a very convenient API to access word alternatives on token basis within a certain string distance. The API allows accessing each token in the stream individually while suggest-selection is left to the API consumer. Yet, often already ranked / selected suggestions are required in order to present to the end-user. Inside ElasticSearch we have the ability to access way more statistics and information quickly to make better decision which token alternative to pick or if to pick an alternative at all. This `phrase` suggester adds some logic on top of the `term` suggester to select entire corrected phrases instead of individual tokens weighted based on a ngram-langugage models. In practice it will be able to make better decision about which tokens to pick based on co-occurence and frequencies. The current implementation is kept quite general and leaves room for future improvements. # API Example The `phrase` request is defined along side the query part in the json request: ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 1, "real_word_error_likelihood" : 0.95, "max_errors" : 0.5, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 } ] } } } } ``` The response contains suggested sored by the most likely spell correction first. In this case we got the expected correction `xorr the god jewel` first while the second correction is less conservative where only one of the errors is corrected. Note, the request is executed with `max_errors` set to `0.5` so 50% of the terms can contain misspellings (See parameter descriptions below). ```json { "took" : 37, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2938, "max_score" : 0.0, "hits" : [ ] }, "suggest" : { "simple_phrase" : [ { "text" : "Xor the Got-Jewel", "offset" : 0, "length" : 17, "options" : [ { "text" : "xorr the god jewel", "score" : 0.17877324 }, { "text" : "xor the god jewel", "score" : 0.14231323 } ] } ] } } ```` # Phrase suggest API ## Basic parameters * `field` - the name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections. * `gram_size` - sets max size of the n-grams (shingles) in the `field`. If the field doesn't contain n-grams (shingles) this should be omitted or set to `1`. * `real_word_error_likelihood` - the likelihood of a term being a misspelled even if the term exists in the dictionary. The default it `0.95` corresponding to 5% or the real words are misspelled. * `confidence` - The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggest candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of `1.0` will only return suggestions that score higher than the input phrase. If set to `0.0` the top N candidates are returned. The default is `1.0`. * `max_errors` - the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range `[0..1)` as a fraction of the actual query terms a number `>=1` as an absolut number of query terms. The default is set to `1.0` which corresponds to that only corrections with at most 1 misspelled term are returned. * `separator` - the separator that is used to separate terms in the bigram field. If not set the whitespce character is used as a separator. * `size` - the number of candidates that are generated for each individual query term Low numbers like `3` or `5` typically produce good results. Raising this can bring up terms with higher edit distances. The default is `5`. * `analyzer` - Sets the analyzer to analyse to suggest text with. Defaults to the search analyzer of the suggest field passed via `field`. * `shard_size` - Sets the maximum number of suggested term to be retrieved from each individual shard. During the reduce phase the only the top N suggestions are returned based on the `size` option. Defaults to `5`. * `text` - Sets the text / query to provide suggestions for. ## Smoothing Models The `phrase` suggester supports multiple smoothing models to balance weight between infrequent grams (grams (shingles) are not existing in the index) and frequent grams (appear at least once in the index). * `laplace` - the default model that uses an additive smoothing model where a constant (typically `1.0` or smaller) is added to all counts to balance weights, The default `alpha` is `0.5`. * `stupid_backoff` - a simple backoff model that backs off to lower order n-gram models if the higher order count is `0` and discounts the lower order n-gram model by a constant factor. The default `discount` is `0.4`. * `linear_interpolation` - a smoothing model that takes the weighted mean of the unigrams, bigrams and trigrams based on user supplied weights (lambdas). Linear Interpolation doesn't have any default values. All parameters (`trigram_lambda`, `bigram_lambda`, `unigram_lambda`) must be supplied. ## Candidate Generators The `phrase` suggester uses candidate generators to produce a list of possible terms per term in the given text. A single candidate generator is similar to a `term` suggester called for each individual term in the text. The output of the generators is subsequently scored in in combination with the candidates from the other terms to for suggestion candidates. Currently only one type of candidate generator is supported, the `direct_generator`. The Phrase suggest API accepts a list of generators under the key `direct_generator` each of the generators in the list are called per term in the original text. ## Direct Generators The direct generators support the following parameters: * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: * `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. * `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. * `always` - Suggest any matching suggestions based on terms in the suggest text. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option. * pre_filter - a filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. (optional) * post_filter - a filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. (optional) The following example shows a `phrase` suggest call with two generators, the first one is using a field containing ordinary indexed terms and the second one uses a field that uses terms indexed with a `reverse` filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The `pre_filter` and `post_filter` options accept ordinary analyzer names. ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 4, "real_word_error_likelihood" : 0.95, "confidence" : 2.0, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 }, { "field" : "reverse", "suggest_mode" : "always", "min_word_len" : 1, "pre_filter" : "reverse", "post_filter" : "reverse" } ] } } } } ``` `pre_filter` and `post_filter` can also be used to inject synonyms after candidates are generated. For instance for the query `captain usq` we might generate a candidate `usa` for term `usq` which is a synonym for `america` which allows to present `captain america` to the user if this phrase scores high enough. Closes #2709	2013-02-28 16:17:59 +01:00
Shay Banon	7400c30eba	fail a shard if a merge failure occurs	2013-02-27 23:44:55 +01:00
Simon Willnauer	7be8f431d5	move id tests into SimpleQueryTests	2013-02-27 19:03:42 +01:00
Simon Willnauer	8ab602ec81	Fix AIOOB exception in UID type/id tuple creation. Closes #2695	2013-02-27 18:58:27 +01:00
Martijn van Groningen	2b5e3f5586	Fixed resolving closest nested object when sorting on a field inside nested object	2013-02-25 16:21:22 +01:00
Shay Banon	bde36647fb	Terms/Ids filter: Support empty list of values, resulting in no match for it closes #2687 also closes #2686	2013-02-25 12:26:49 +01:00
Shay Banon	4145d154bb	add a test for empty lookup terms filter	2013-02-25 11:58:58 +01:00
Shay Banon	595e0e254e	[Code refactoring] IndicesStats -> IndicesStatsResponse fixes #1782	2013-02-23 14:23:36 +01:00
David Pilato	4c493ac71d	Revert changes on *Request classes from issue Relative to #2657	2013-02-23 10:37:56 +01:00
David Pilato	a646e126e9	Display list of all available site plugins on /_plugins/ end point fix #2664	2013-02-23 09:34:06 +01:00
Igor Motov	b8cc8e56c4	Improve stability of SimpleRobinEngineTests	2013-02-22 14:59:49 -05:00
Shay Banon	03fdc6aa80	Query DSL: Terms filter to allow for terms lookup from another document closes #2674	2013-02-22 14:04:10 +01:00
Igor Motov	ec3492c67c	Improve stability of the testReusePeerRecovery test	2013-02-21 16:06:33 -05:00
Igor Motov	ce6f0e27bf	Make file distribution among several disks configurable Fixes #2650	2013-02-19 21:43:43 -05:00
David Pilato	b7afa0f44e	Fix test for Support trailing slashes on plugin _site URLs #2654	2013-02-19 21:16:47 +01:00
Martijn van Groningen	3b31c1216e	Made the `term_vector` json field the leading way of configuring term vectors. Supported options: `no`, `yes`, `with_offsets`, `with_positions`, `with_positions_offsets` and`with_positions_offsets_payloads`.	2013-02-19 20:55:43 +01:00
Igor Motov	5b9e9a004a	Make sure that in SitePluginTests http client connects to the correct node and closes the node after the test	2013-02-19 14:42:24 -05:00
Igor Motov	d126558dec	Add check for health timeout to shardCleanup test	2013-02-19 13:12:26 -05:00
David Pilato	8ab9d2dd1f	Support trailing slashes on plugin _site URLs fix #2654	2013-02-19 09:21:45 +01:00
Igor Motov	cfaa859bb2	Improve stability of UpdateNumberOfReplicasTests	2013-02-18 20:12:39 -05:00
Igor Motov	5746c50ef9	Improve stability of shardsCleanup test	2013-02-18 19:35:12 -05:00
Igor Motov	183a74c866	Improve stability of testSimpleAwareness test	2013-02-18 19:31:07 -05:00
Martijn van Groningen	303e87fb69	Added support for sorting by fields inside one or more nested objects. The sorting by nested field support has the following parameters on top of the already existing sort options: nested_path - Defines the on what nested object to sort. The actual sort field must be a direct field inside this nested object. The default is to use the most immediate inherited nested object from the sort field. nested_filter - A filter the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting. Common case is to repeat the query / filter inside the nested filter or query. By default no nested_filter is active. Either the highest (max) or lowest (min) inner object is picked for during sorting depending on the sort_mode being used. The sort_mode options avg and sum can still be used for number based fields inside nested objects. All the values for the sort field are taken into account for each nested object. Closes #2662	2013-02-18 22:10:41 +01:00
Simon Willnauer	8db436f107	Remove backported Lucene 4 spatial code in favor of the released version in Lucene 4.1	2013-02-18 18:43:55 +01:00
Jeffrey Gerard	0dfc2169d7	Added Testcase and BugFix fixing #2626 where GeoShape intersects filter omitted matching docs. SpatialPrefixTree#recursiveGetNodes uses an optimization that prevents recursion into the deepest tree level if a parent node in the penultimate level covers all its children. This produces a bug if the optimization happens both at indexing and at query/filter time. This patch fixes the bug by disabling the optimization at indexing time (to avoid adding overhead for query-heavy workloads). See LUCENE-4770 for reference	2013-02-18 18:43:47 +01:00
David Pilato	cc83c2f848	refactoring getter/setters Fixes #2657	2013-02-18 11:09:32 -05:00
Martijn van Groningen	ac2e6a3a4d	Fixed nested facets with filters.	2013-02-18 11:01:18 -05:00
Simon Willnauer	24291d40f4	Expose CJKWidthTokenFilter and CJKBigramTokenFilter Closes #2660	2013-02-18 11:01:17 -05:00
Shay Banon	547bd7abf2	add our own bloom filter implementation uses more hash iterations, yet require less memory for the same fpp relates to #2411	2013-02-18 11:01:17 -05:00
Shay Banon	73a447da86	initial facet refactoring the main goal of the facet refactoring is to allow for two modes of facet execution, collector based, that get callbacks as hist match, and post based, which iterates over all the relevant hits it also includes a some simplification of the facet implementation	2013-02-16 02:25:04 +01:00
Shay Banon	06b82a45d4	Simplified range syntax when using a query string closes #2655	2013-02-15 01:30:55 +01:00
Shay Banon	4714a6acc9	Clear cache: allow to invalidate specific filter cache keys closes #2653	2013-02-14 21:13:19 +01:00
Igor Motov	37f16127c5	Fix ScriptFilter cache key calculation Fixes #2651	2013-02-14 06:13:26 -05:00
Shay Banon	f41eccc7a5	updating non dynamic settings throws an error now	2013-02-13 14:28:16 +01:00
Shay Banon	5519f80abb	add increased timeout waiting for relocation when running on small boxes	2013-02-12 21:23:18 +01:00
Martijn van Groningen	fc13499ff5	Added `sort_mode` option that defines what value to pick in the case the sort field is multi-valued. The `min` and `max` sort modes are supported for all field types. Either the lowest value or the highest value is picked. In addition to that number based fields also support `sum` and `avg` as sort mode. If `sum` sort mode is used then all the values for a field and belonging to a document are added together and the result of that is used as sort value. If the `avg` sort mode is used then the average of all values for the sort field belonging to that document is used as sort value. Relates to #2634	2013-02-12 20:38:24 +01:00
Shay Banon	7d13545e33	delete indices before running the tests	2013-02-12 19:28:48 +01:00
Shay Banon	668bcd0eb7	Bulk execution while a shard is replication might send erroneous version conflict failures for certain items fixes #2642	2013-02-12 17:38:06 +01:00
Simon Willnauer	a7bbab7e87	# Rescore Feature The rescore feature allows te rescore a document returned by a query based on a secondary algorithm. Rescoring is commonly used if a scoring algorithm is too costly to be executed across the entire document set but efficient enough to be executed on the Top-K documents scored by a faster retrieval method. Rescoring can help to improve precision by reordering a larger Top-K window than actually returned to the user. Typically is it executed on a window between 100 and 500 documents while the actual result window requested by the user remains the same. # Query Rescorer The `query` rescorer executes a secondary query only on the Top-K results of the actual user query and rescores the documents based on a linear combination of the user query's score and the score of the `rescore_query`. This allows to execute any exposed query as a `rescore_query` and supports a `query_weight` as well as a `rescore_query_weight` to weight the factors of the linear combination. # Rescore API The `rescore` request is defined along side the query part in the json request: ```json curl -s -XPOST 'localhost:9200/_search' -d { "query" : { "match" : { "field1" : { "query" : "the quick brown", "type" : "boolean", "operator" : "OR" } } }, "rescore" : { "window_size" : 50, "query" : { "rescore_query" : { "match" : { "field1" : { "query" : "the quick brown", "type" : "phrase", "slop" : 2 } } }, "query_weight" : 0.7, "rescore_query_weight" : 1.2 } } } ``` Each `rescore` request is executed on a per-shard basis within the same roundtrip. Currently the rescore API has only one implementation (the `query` rescorer) which modifies the result set in-place. Future developments could include dedicated rescore results if needed by the implemenation ie. a pair-wise reranker. Note: Only regualr queries are rescored, if the search type is set to `scan` or `count` rescorers are not executed. Closes #2640	2013-02-12 17:10:00 +01:00
Shay Banon	c65aff7775	Index with no replicas might loose on going documents while relocating a shard fixes #26421	2013-02-12 17:03:28 +01:00
uboness	a2b87e28f6	fixed a bug in PrioritizedThreadPoolExecutor: now execute(Runnable) satisfies the priority and fifo nature of same-priority runnables	2013-02-09 04:20:16 +01:00
uboness	6d9048f8cc	added priority support for cluster state updates: * URGENT: * cluster_reroute (api) * refresh-mapping * cluster_update_settings * reroute_after_cluster_update_settings * create-index * delete-index * index-aliases * remove-index-template * create-index-template * update-mapping * remove-mapping * put-mapping * open-index * close-index * update-settings * HIGH * routing-table-updater * zen-disco-node_left * zen-disco-master_failed * shard-failed * shard-started * NORMAL * all other actions	2013-02-09 01:14:57 +01:00
Simon Willnauer	f5331c9535	Cleanup NumericFieldData. FieldData interfaces are reduced to long and double while internal represenations still operate on the actual datatypes.	2013-02-08 20:58:36 +01:00
Martijn van Groningen	1189a2c2c2	Extended mv sorting integration test	2013-02-08 15:24:56 +01:00
Martijn van Groningen	8c7779057c	Added sort by field that have multiple values per document. Closes #2634	2013-02-08 13:28:40 +01:00
Martijn van Groningen	f97021b165	Fixes size assertion failure.	2013-02-07 16:50:54 +01:00
Martijn van Groningen	e72e323c8a	Attempt to fix "No active shards" failure	2013-02-07 10:14:10 +01:00
Lee Hinman	ed43ad07d7	Throw a more meaningful message when no document is specified for indexing	2013-02-06 22:33:02 +01:00
Florian Schilling	a52e01f3e5	Remove XTermsFilter and UidFilter in favour of Lucene 4.1 TermsFilter	2013-02-06 18:45:05 +01:00
Igor Motov	6890c9fa62	Move action.wait_on_mapping_change setting to pom	2013-02-06 11:48:58 -05:00
Igor Motov	ed09ba0a18	Improve stability of RecoveryPercolatorTests Without "action.wait_on_mapping_change" setting set to true, the test node might get shutdown before updated mapping is saved.	2013-02-05 14:53:46 -05:00
Igor Motov	8277833f8d	Fix settings processing in WordDelimiterTokenFilterFactory	2013-02-05 10:03:00 -05:00
Martijn van Groningen	19295280d9	Made sure that wrapped child query / parent query gets rewritten only once.	2013-02-05 10:27:31 +01:00
Igor Motov	9e89323ad2	Add proper cleanup to InternalSettingsPerparerTests	2013-02-04 19:58:40 -05:00
Martijn van Groningen	8109d13733	Use CacheRecycler when resolving parent docs in TopChildrenQuery.	2013-02-04 12:46:30 +01:00
Martijn van Groningen	9c3a86875b	Removed `execution_type` for has_child and has_parent.	2013-02-04 11:37:40 +01:00
Shay Banon	a8c9e580ed	add getMaxOrd, and properly document the difference between it and numOrds	2013-02-01 16:13:13 +01:00
Igor Motov	45b2bff8da	Improve SearchStatsTests Added refresh to guarantee that at least something will be fetched on a fast computer.	2013-01-31 21:19:08 -05:00
Igor Motov	3c9541dd14	Make facet and sort tests more reliable in case of multiple nodes and shards Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.	2013-01-31 21:19:07 -05:00
Igor Motov	6a01e7882c	Improve shardsCleanup test When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.	2013-01-31 21:18:14 -05:00
Igor Motov	e32efba3d8	Improve RecoverAfterNodes tests	2013-01-31 20:05:55 -05:00
Simon Willnauer	1a1df06411	Move OrdsBuilding into a dedicated class and abstract integer pools used to build sparse ordinals	2013-01-31 19:02:31 +01:00
Martijn van Groningen	46dd42920c	Remove scope support in query and facet dsl. Remove support for the `scope` field in facets and `_scope` field in the nested and parent/child queries. The scope support for nested queries will be replaced by the `nested` facet option and a facet filter with a nested filter. The nested filters will now support the a `join` option. Which controls whether to perform the block join. By default this enabled, but when disabled it returns the nested documents as hits instead of the joined root document. Search request with the current scope support. ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } }, "_scope" : "my_scope" } }, "facets" : { "size" : { "terms" : { "field" : "offers.size" }, "scope" : "my_scope" } } }' ``` The following will be functional equivalent of using the scope support: ``` curl -s -XPOST 'localhost:9200/products/_search?search_type=count' -d '{ "query" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } } } }, "facets" : { "size" : { "terms" : { "field" : "offers.size" }, "facet_filter" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } }, "join" : false } }, "nested" : "offers" } } }' ``` The scope support for parent/child queries will be replaced by running the child query as filter in a global facet. Search request with the current scope support: ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "has_child" : { "type" : "offer", "query" : { "match" : { "color" : "blue" } }, "_scope" : "my_scope" } }, "facets" : { "size" : { "terms" : { "field" : "size" }, "scope" : "my_scope" } } }' ``` The following is the functional equivalent of using the scope support with parent/child queries: ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "has_child" : { "type" : "offer", "query" : { "match" : { "color" : "blue" } } } }, "facets" : { "size" : { "terms" : { "field" : "size" }, "global" : true, "facet_filter" : { "term" : { "color" : "blue" } } } } }' ``` Closes #2606	2013-01-31 15:09:57 +01:00
Martijn van Groningen	355381962b	Use only the 'test' index, instead of all indices for child search benchmark.	2013-01-31 13:12:33 +01:00
Shay Banon	6cec73c201	remove fuzzy factor from mapping (internally implemented) we want to support ~ notion in query parser for types other than strings, we are getting there, one can do now age:10~5, we would love to support it for dates, as in timestamp:2012-10-10~5d, but that requires changes in the query parser to support strings after the ~ sign	2013-01-31 12:23:03 +01:00
Igor Motov	8df7f2af0d	Improve testReusePeerRecovery test	2013-01-30 19:51:41 -05:00
Igor Motov	29f4274213	Add index cleanup if index creation fails Fixes #2590	2013-01-30 10:40:01 -05:00
Martijn van Groningen	bc20f068c9	Made `search_analyzer` updateable via put mapping api. Closes #2604	2013-01-30 11:49:20 +01:00
Simon Willnauer	5df37eaf75	add more advanced tests for phrase_prefix	2013-01-30 10:51:05 +01:00
Simon Willnauer	0697e2f23e	use index prefix in tests to prevent misconfiguration	2013-01-28 15:51:06 +01:00
Simon Willnauer	72a2416a8c	Support MultiPhrasePrefixQuery and MultiPhraseQuery in highlighters Closes #2596	2013-01-28 15:41:25 +01:00
Martijn van Groningen	2e68207d6d	Updated suggest api. # Suggest feature The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available from version `0.21.0`. # Fuzzy suggester The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request. # Suggest API The suggest request part is defined along side the query part as top field in the json request. ``` curl -s -XPOST 'localhost:9200/_search' -d '{ "query" : { ... }, "suggest" : { ... } }' ``` Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. Both `my-suggest-1` and `my-suggest-2` suggestions use the `fuzzy` suggester, but have a different `text`. ``` "suggest" : { "my-suggest-1" : { "text" : "the amsterdma meetpu", "fuzzy" : { "field" : "body" } }, "my-suggest-2" : { "text" : "the rottredam meetpu", "fuzzy" : { "field" : "title", } } } ``` The below suggest response example includes the suggestion response for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains entries. Each entry is effectively a token from the suggest text and contains the suggestion entry text, the original start offset and length in the suggest text and if found an arbitary number of options. ``` { ... "suggest": { "my-suggest-1": [ { "text" : "amsterdma", "offset": 4, "length": 9, "options": [ ... ] }, ... ], "my-suggest-2" : [ ... ] } ... } ``` Each options array contains a option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance. ``` "options": [ { "text": "amsterdam", "freq": 77, "score": 0.8888889 }, ... ] ``` # Global suggest text To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is defined globally and applies to the `my-suggest-1` and `my-suggest-2` suggestions. ``` "suggest" : { "text" : "the amsterdma meetpu" "my-suggest-1" : { "fuzzy" : { "field" : "title" } }, "my-suggest-2" : { "fuzzy" : { "field" : "body" } } } ``` The suggest text can in the above example also be specied as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level. # Other suggest example. In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase. ``` curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{ "suggest" : { "my-title-suggestions-1" : { "text" : "devloping distibutd saerch engies", "fuzzy" : { "size" : 3, "field" : "title" } } } }' ``` The above request could yield the response as stated in the code example below. As you can see if we take the first suggested options of each suggestion entry we get `developing distributed search engines` as result. ``` { ... "suggest": { "my-title-suggestions-1": [ { "text": "devloping", "offset": 0, "length": 9, "options": [ { "text": "developing", "freq": 77, "score": 0.8888889 }, { "text": "deloping", "freq": 1, "score": 0.875 }, { "text": "deploying", "freq": 2, "score": 0.7777778 } ] }, { "text": "distibutd", "offset": 10, "length": 9, "options": [ { "text": "distributed", "freq": 217, "score": 0.7777778 }, { "text": "disributed", "freq": 1, "score": 0.7777778 }, { "text": "distribute", "freq": 1, "score": 0.7777778 } ] }, { "text": "saerch", "offset": 20, "length": 6, "options": [ { "text": "search", "freq": 1038, "score": 0.8333333 }, { "text": "smerch", "freq": 3, "score": 0.8333333 }, { "text": "serch", "freq": 2, "score": 0.8 } ] }, { "text": "engies", "offset": 27, "length": 6, "options": [ { "text": "engines", "freq": 568, "score": 0.8333333 }, { "text": "engles", "freq": 3, "score": 0.8333333 }, { "text": "eggies", "freq": 1, "score": 0.8333333 } ] } ] } ... } ``` # Common suggest options: * `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion. # Common fuzzy suggest options * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value: `score` - Sort by sore first, then document frequency and then the term itself. `frequency` - Sort by document frequency first, then simlarity score and then the term itself. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. ** `always` - Suggest any matching suggestions based on terms in the suggest text. # Other fuzzy suggest options: * `lowercase_terms` - Lower cases the suggest text terms after text analyzation. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option.	2013-01-28 15:18:18 +01:00
Simon Willnauer	48488f707f	Expose CommonTermsQuery in Match & MultiMatch and enable highlighting Closes #2591	2013-01-28 11:57:05 +01:00
Simon Willnauer	5c89d66216	move ShardsAllocatorModuleTests to o.e.t.integration	2013-01-25 22:26:30 +01:00
Shay Banon	042a5d02d9	Primary shard failure with initializing replica shards can cause the replica shard to cause allocation failures fixes #2592	2013-01-25 17:59:01 +01:00
Shay Banon	990acff4f7	make sure we wait for yellow stats in suggest API when searching on clean index	2013-01-24 22:31:51 +01:00
Martijn van Groningen	9013eeae8a	Added filter support in the `has_child` and `has_parent` filters. Example: ``` curl -XPOST 'localhost:9200/_search' -d '{ "query": { "filtered_query": { "query": { "match": { "title": "distributed systems" } }, "filter": { "has_child": { "type": "tag", "filter": { "term": { "name": "book" } } } } } } }' ``` Closes #2585	2013-01-24 21:32:38 +01:00
Martijn van Groningen	98a674fc6e	Added suggest api. # Suggest feature The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available since version `0.21.0`. # Fuzzy suggester The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request. # Suggest API The suggest request part is defined along side the query part as top field in the json request. ``` curl -s -XPOST 'localhost:9200/_search' -d '{ "query" : { ... }, "suggest" : { ... } }' ``` Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. The `my-suggest-1` suggestion uses the `body` field and `my-suggest-2` uses the `title` field. The `type` field is a required field and defines what suggester to use for a suggestion. ``` "suggest" : { "suggestions" : { "my-suggest-1" : { "type" : "fuzzy", "field" : "body", "text" : "the amsterdma meetpu" }, "my-suggest-2" : { "type" : "fuzzy", "field" : "title", "text" : "the rottredam meetpu" } } } ``` The below suggest response example includes the suggestions part for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains a terms array, that contains all terms outputted by the analyzed suggest text. Each term object includes the term itself, the original start and end offset in the suggest text and if found an arbitary number of suggestions. ``` { ... "suggest": { "my-suggest-1": { "terms" : [ { "term" : "amsterdma", "start_offset": 5, "end_offset": 14, "suggestions": [ ... ] } ... ] }, "my-suggest-2" : { "terms" : [ ... ] } } ``` Each suggestions array contains a suggestion object that includes the suggested term, its document frequency and score compared to the suggest text term. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance. ``` "suggestions": [ { "term": "amsterdam", "frequency": 77, "score": 0.8888889 }, ... ] ``` # Global suggest text To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is a global option and applies to the `my-suggest-1` and `my-suggest-2` suggestions. ``` "suggest" : { "suggestions" : { "text" : "the amsterdma meetpu", "my-suggest-1" : { "type" : "fuzzy", "field" : "title" }, "my-suggest-2" : { "type" : "fuzzy", "field" : "body" } } } ``` The suggest text can be specied as global option or as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level. # Other suggest example. In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase. ``` curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{ "suggest" : { "suggestions" : { "my-title-suggestions" : { "suggester" : "fuzzy", "field" : "title", "text" : "devloping distibutd saerch engies", "size" : 3 } } } }' ``` The above request could yield the response as stated in the code example below. As you can see if we take the first suggested term of each suggest text term we get `developing distributed search engines` as result. ``` { ... "suggest": { "my-title-suggestions": { "terms": [ { "term": "devloping", "start_offset": 0, "end_offset": 9, "suggestions": [ { "term": "developing", "frequency": 77, "score": 0.8888889 }, { "term": "deloping", "frequency": 1, "score": 0.875 }, { "term": "deploying", "frequency": 2, "score": 0.7777778 } ] }, { "term": "distibutd", "start_offset": 10, "end_offset": 19, "suggestions": [ { "term": "distributed", "frequency": 217, "score": 0.7777778 }, { "term": "disributed", "frequency": 1, "score": 0.7777778 }, { "term": "distribute", "frequency": 1, "score": 0.7777778 } ] }, { "term": "saerch", "start_offset": 20, "end_offset": 26, "suggestions": [ { "term": "search", "frequency": 1038, "score": 0.8333333 }, { "term": "smerch", "frequency": 3, "score": 0.8333333 }, { "term": "serch", "frequency": 2, "score": 0.8 } ] }, { "term": "engies", "start_offset": 27, "end_offset": 33, "suggestions": [ { "term": "engines", "frequency": 568, "score": 0.8333333 }, { "term": "engles", "frequency": 3, "score": 0.8333333 }, { "term": "eggies", "frequency": 1, "score": 0.8333333 } ] } ] } } ... } ``` # Common suggest options: * `suggester` - The suggester implementation type. The only supported value is 'fuzzy'. This is a required option. * `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion. # Common fuzzy suggest options * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value: `score` - Sort by sore first, then document frequency and then the term itself. `frequency` - Sort by document frequency first, then simlarity score and then the term itself. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. ** `always` - Suggest any matching suggestions based on terms in the suggest text. # Other fuzzy suggest options: * `lowercase_terms` - Lower cases the suggest text terms after text analyzation. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option. Closes #2585	2013-01-24 15:41:06 +01:00
Simon Willnauer	4eefcb9c82	Expose CommonTermsQuery Closes #2583	2013-01-24 14:18:01 +01:00
Shay Banon	c2f35621f6	allow to get settings as delimited string	2013-01-24 12:03:16 +01:00
Shay Banon	b143822bac	allow to load settings from delimited string	2013-01-24 12:00:14 +01:00
Simon Willnauer	88f68264c7	Reuse MemoryIndex instances across Percolator requests. * added configurable MemoryIndexPool that pools MemoryIndex instance across Threads * Pool can be configured based on the number of pooled instances as well as the maximum number of bytes that is reused across the pooled instances Closes #2581	2013-01-24 11:53:21 +01:00
Shay Banon	e8c1180ede	add field data stats	2013-01-24 11:38:18 +01:00
Shay Banon	613b746299	move field data type to simply be type and settings	2013-01-24 09:33:16 +01:00
Martijn van Groningen	346422b747	Added sparse multi ordinals implementation for field data.	2013-01-23 22:11:31 +01:00
Shay Banon	a74e7f8099	refactor geo to extract common classes	2013-01-23 14:14:21 +01:00
Shay Banon	d969e61999	Remove settings option for index store compression, compression is always enabled closes #2577	2013-01-23 13:11:48 +01:00
Simon Willnauer	2880cd0172	Upgrade to Lucene 4.1 * Removed CustmoMemoryIndex in favor of MemoryIndex which as of 4.1 supports adding the same field twice * Replaced duplicated logic in X[]FSDirectory for rate limiting with a RateLimitedFSDirectory wrapper Remove hacks to find out merge context in rate limiting in favor of IOContext * replaced Scorer#freq() return type (from float to int) * Upgraded FVHighlighter to new 'centered' highlighting * Fixed RobinEngine to use seperate setCommitData	2013-01-23 11:54:11 +01:00
Igor Motov	bbfd3957eb	Improve stability of the testNodesInfos test	2013-01-22 12:29:38 -05:00
Igor Motov	9becdb814a	Improve stability of the shardsCleanup test	2013-01-22 10:20:18 -05:00
Shay Banon	c295211a85	final move to new field data	2013-01-22 16:16:33 +01:00
uboness	09cc70b8c9	added predefined empty implementation for all atomic field datas	2013-01-22 16:16:33 +01:00
Shay Banon	772ee9db54	move terms to use new field data	2013-01-22 16:16:32 +01:00
Shay Banon	e5b651321f	remove some safe methods because of the new makeSafe method usage	2013-01-22 16:16:32 +01:00
Shay Banon	5b7173fc35	move sorting to work with new field data	2013-01-22 16:16:32 +01:00
Shay Banon	45f27fe96a	add packed bytes variant for strings/bytes	2013-01-22 16:16:32 +01:00
uboness	855b64a8a7	byte field data implementation	2013-01-22 16:16:31 +01:00
uboness	f1f3c241fd	short field data implementation	2013-01-22 16:16:31 +01:00
uboness	3840439365	float field data implementation	2013-01-22 16:16:31 +01:00
uboness	fc09ce7ac9	Implemented int field data	2013-01-22 16:16:31 +01:00
Shay Banon	e0b280f9b3	use FieldMapper.Names for fieldNames, and not just fieldName as string	2013-01-22 16:16:30 +01:00
Shay Banon	7dc5cf9799	add long field support	2013-01-22 16:16:30 +01:00
Shay Banon	7397007e05	initial commit	2013-01-22 16:16:30 +01:00
Simon Willnauer	35cf9ee11d	wait for cluster to be formed in SimpleNodesInfoTests	2013-01-19 15:44:26 +01:00
Simon Willnauer	d6b613ac8c	Respect lowercase_expanded_terms in MappingQueryParser Fixes #2566	2013-01-19 13:57:45 +01:00
Simon Willnauer	c563248f76	testMoreLikeThisIssue2197 should create index mapping first to prevent races	2013-01-18 16:41:37 +01:00
Simon Willnauer	6f38a3a8a8	create index and mapping first to ensure all relevant nodes see the mapping	2013-01-18 16:09:24 +01:00
Simon Willnauer	393de984bd	Remove deprecated StreamInput/Output#read/writeUTF	2013-01-17 22:38:42 +01:00
Simon Willnauer	3d80c53192	Allow ShardsAllocator to be configured via node level settings. * Default ShardsAllocator is set to BalancedShardsAllocator * Core ShardsAllocator implementations can be defined via 'cluster.routing.allocation.type' * Core ShardsAllocator implementations are exposed via short keys 'balanced' (BalancedShardsAllocator) and 'even_shards' (EvenShardsCountAllocator) * Third party allocators can be loaded via fully-qualified class names. Closes #2557	2013-01-17 16:23:52 +01:00
Simon Willnauer	2eb09e6b1a	Added BalancedShardsAllocator that balances shards based on a weight function. * Weights are calculated per index and incorporate index level, global and primary related parameters * Balance operations are executed based on a win maximation strategy that tries to relocate shards first that offer the biggest gain towards the weight functions optimum * The WeightFunction allows settings to prefer index based balance over global balance and vice versa * Balance operations can be throttled by raising a threshold resulting in less agressive balance operations * WeightFunction shipps with defaults to achive evenly distributed indexes while maintaining a global balance Closes #2555	2013-01-17 12:02:42 +01:00
Igor Motov	d97839b8a8	Fix char filter issues introduced during lucene 4 migration Fixes #2543	2013-01-14 12:43:02 -05:00
Igor Motov	6243f8e64d	Disallow unknown custom indexing parameters Fixes #2354	2013-01-11 10:14:25 -05:00
Martijn van Groningen	1ce10dfb06	Fixed issue where parent & child queries can fail if a segment doesn't have documents with the targeted type or associated parent type Closes #2537	2013-01-11 16:06:14 +01:00
Shay Banon	2c4b9d9ba2	cleanup queryHint since its not was never used preference ended up as the way to control routing	2013-01-07 04:02:45 +01:00
Shay Banon	0e5287f1f2	Binary Mapped Fields: Allow to not store them by default, and return BytesReference fixes #2523 also, fix another point of normalization of the result for get API	2013-01-05 01:50:46 +01:00
Shay Banon	4b9fcdb900	noramalize the value even when getting it from source we need to in order to properly handle bytes, and normalize Integer to Long for example for consistency, the fact that mappers now handle different Objtes help here	2013-01-04 23:55:34 +01:00
Shay Banon	bf4c442509	add refresh before calling count	2013-01-04 08:38:00 +01:00
Shay Banon	70f1e2c987	remove problematic timeout test the timeout feature, even if set to 0, might still mean we get an ack back...	2013-01-04 08:19:01 +01:00
Igor Motov	d73a6663b7	Changing non-nested object mapping to nested should fail Fixes #2518	2013-01-03 18:18:40 -05:00
uboness	dc25939b7c	fixed hunspell test to clean up properly, this time, for realz	2013-01-03 22:54:37 +01:00
uboness	86f55b3a45	fixed hunspell test to clean up properly	2013-01-03 22:12:36 +01:00
Martijn van Groningen	7cf80aca99	Changed how the stored values of numeric fields are stored in the index. Before numeric values were stored in binary representation, now the values in numeric representation.	2013-01-03 21:34:53 +01:00
uboness	6c4108b38a	Support for hunspell token filter Closes: #646 - Introduced HunspellService which holds a repository of hunspell dictionaries - It is possible to register a dictionary via a plugin or by placing the dictionary files on the file system	2013-01-02 03:51:26 +01:00
Shay Banon	b6f766af3f	backport lucene 4.1 terms filter and use it where applicable	2012-12-29 10:39:53 -08:00
Shay Banon	b08e8fb76c	add explicit termsFilter in mapper, and use that in terms filter This also enabled support for terms filter on _id field for example	2012-12-29 01:06:32 -08:00
Shay Banon	01ba287164	more mapper simplification, reduce the value methods	2012-12-28 20:33:10 -08:00
Shay Banon	e02015c641	Use FieldType and not deprecated Field construction	2012-12-28 14:27:09 -08:00
Shay Banon	64a01c28c3	rename fieldQuery/fieldFilter to termQuery/termFilter in mappers	2012-12-28 13:48:48 -08:00
Shay Banon	7fb98769a6	add a sleep to make sure settings are applied	2012-12-27 14:16:30 -08:00
Igor Motov	b7ff23ff93	Update settings: Allow to dynamically update thread pool settings Closes #2509	2012-12-27 09:39:27 -05:00
Simon Willnauer	750c30f0b8	allow index and type to be specified as arrays in MultiSearchRequest	2012-12-26 16:18:17 -08:00
Shay Banon	ef55e4feec	fix failed tests due to wrapping failures with mapping parsing exception	2012-12-26 15:59:07 -08:00
Shay Banon	8a17222ff2	match_all filter with empty array (instead of obj) fires exception when used with facets fixes #2493	2012-12-26 15:35:30 -08:00
Shay Banon	2f4b759df7	Allow highlighting on wildcard fields.. ie, comment_* closes #2396	2012-12-26 15:00:31 -08:00
Shay Banon	4b69846ba2	cleanup	2012-12-26 14:21:04 -08:00
Simon Willnauer	90bd82ac50	Pass topScorer=false to sub-scorers if a scorer is wrapped. Wrapped BooleanQuery can return collect-only scorers. See #2505	2012-12-26 14:20:14 -08:00
Martijn van Groningen	c93babed42	Minor changes to the parent / child benchmark.	2012-12-24 22:12:10 +01:00
Martijn van Groningen	c6aaefa27f	Improved explain support for nested query. Closes #2503	2012-12-24 13:40:20 +01:00
Martijn van Groningen	d57d89937f	Added scoring support to `has_child` and `has_parent` queries. Added score support to `has_child` and `has_parent` queries. Both queries support a score_type option. The has_child support the same options as the top_children query and the none option which is the default and yields the current behaviour. The has_parent query support the score type options: score and none. The latter is the default and yields the current behaviour. If the score_type is set to a value other than none then the has_parent query map the matched parent score into the related children documents. The has_child query then map the matched children documents into the related parent document. The score_type on both queries defines how the children documents scores are mapped in the parent documents. Both queries are executed in two phases. First phase collects the parent uid values of matching documents with an aggregated score per parent uid value. In the second phase either child or parent typed documents are emitted as hit that have the same parent uid value as found during the first phase. The score computed in the first phase will be used as score. Closes #2502	2012-12-24 11:39:43 +01:00
Igor Motov	fcdc36977c	Fix failure message serialization in MultiSearchResponse Fixes #2498	2012-12-21 19:26:48 -05:00
Martijn van Groningen	08b026d060	Fixed `top_children` query failure with dfs_query* search types. Fixed error with the top_children query when `DFS_QUERY_*` is used as search_type and wraps a query that gets rewritten (E.g wildcard query). Closes #2501	2012-12-21 18:08:44 +01:00
Martijn van Groningen	694989141b	Fixed AOBE when using `top_children` in a must not clause. Closes #2500	2012-12-21 16:47:03 +01:00
Martijn van Groningen	826a6ab02a	Improved XBooleanFilter by adding drive logic for bit based filter impl and adding unit test, which tests all possible XBooleanFilter options.	2012-12-19 22:43:47 +01:00
Shay Banon	14678a91ab	nested path to be represented as bytes as well as string	2012-12-18 15:39:13 -08:00
Shay Banon	ac253178bd	more cleanup in mappings	2012-12-18 13:28:36 -08:00
Shay Banon	1867ef5084	simplify toXContent generation of field mappers	2012-12-18 13:00:47 -08:00
Martijn van Groningen	ddea22771e	Fixed mlt api bug related to custom routing value. If the a routing value isn't id based, the get part of the mlt request couldn't retrieve the document for the second part of the mlt request and a 500 code is returned instead. This fix addresses this issue. Closes #2489	2012-12-17 11:00:30 +01:00
uboness	8b74c42099	Support for RegexpQuery & RegexpFilter - Added "regexp" query type (based on Lucene 4 RegexpQuery) - Added "regexp" filter type - Fixed a bug in IdFieldMapper where prefixQuery on a single type would be redundantly wrapped in a boolean query	2012-12-16 23:24:18 +01:00
Igor Motov	c8285739d2	Correctly parse : query into matchAllDocsQuery Fixes #2486	2012-12-14 14:36:20 -08:00
Shay Banon	36fd76b826	don't call toLowerCase on each bulk item	2012-12-12 21:56:23 -08:00
Martijn van Groningen	ea9a4d70cf	lucene 4: Removed the usage of Document & Field when retrieving stored fields.	2012-12-06 18:18:52 +01:00
Igor Motov	d947dfde2b	Add support for ignoring settings in system properties. An elasticsearch node can be instructed to ignore settings specified in system properties by setting config.ignore_system_properties setting to true.	2012-12-06 09:37:36 -05:00
Martijn van Groningen	f72d5c1907	Expose fragmenter option for plain / normal highlighter. Closes #2465	2012-12-06 14:59:42 +01:00
Shay Banon	c22b521800	fix properly handling acceptDocs in filters our idea is to apply it on the "filtered/constant" level, and not on compound filters, so we won't apply it multiple times. The solution is conservative a bit now, we can further optimize it in the future, for example, not to wrap it when no caching is done within the filter chain	2012-12-06 01:55:16 +01:00
Martijn van Groningen	6cfd938dce	Fixed unable to highlight on all multi-valued field values. Closes #2384	2012-12-03 12:39:18 +01:00
Shay Banon	f17ad829ac	remove snappy support relates to #2459	2012-12-03 12:30:13 +01:00
Shay Banon	a2a8553faf	Indexing Slow Log closes #2457	2012-12-03 10:21:59 +01:00
Igor Motov	6021515567	The relevancy score in explanation should match the actual score in custom_filters_query Fixes #2441	2012-11-27 10:13:16 -08:00
Shay Banon	69ef822da6	cleanup docsets - remove the DocSet abstraction, and use Bits where we can by getting it from DocIdSet - better handling of acceptDocs, though still need to properly apply them when caching is involved	2012-11-27 10:04:21 -08:00
Igor Motov	fb9143aac1	fix sporadically disappearing fields during concurrent dynamic mapping updates	2012-11-24 14:02:58 +01:00
Simon Willnauer	32a0772821	#2436 expose KeepWordTokenFilter by default	2012-11-23 10:11:30 +01:00
Igor Motov	65a43d3ad4	Fix handling of stop word _lang_ notation Fixes #2412	2012-11-23 09:54:02 +01:00
Shay Banon	e1679b89bb	fix failed test that were using the wrong form match query	2012-11-22 15:14:02 +01:00
Shay Banon	192cf5298a	fix failed test that were using the wrong form match query	2012-11-22 14:44:03 +01:00
Chris Male	2541847945	Added control over Query used by MatchQuery with there are zero terms after analysis	2012-11-22 22:13:29 +13:00
Chris Male	9e2469e04f	Add per-field Similarity support	2012-11-21 12:44:59 +13:00
Martijn van Groningen	be70722de7	Renamed pulsing40 and Lucene40 postings format providers to pulsing and default respectively for more consistent naming in settings.	2012-11-15 09:54:00 +01:00
Martijn van Groningen	20c6085852	changed test method names.	2012-11-15 09:40:24 +01:00
Martijn van Groningen	e80f74584b	Added licence header.	2012-11-15 00:18:49 +01:00
Martijn van Groningen	fd5bd102aa	lucene 4: Exposed Lucene's codec api This feature adds the option to configure a `PostingsFormat` and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs. ## Configuring a postingsformat per field There're several default postings formats configured by default which can be used in your mapping: a* `direct` - A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance. * `memory` - A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist. * `bloom_default` - Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of the `default` postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist. * `bloom_pulsing` - Similar to the `bloom_default` postings format, but builds on top of the `pulsing` postings format. * `default` - The default postings format. The default if none is specified. On all fields it possible to configure a `postings_format` attribute. Example mapping: ``` { "person" : { "properties" : { "second_person_id" : {"type" : "string", "postings_format" : "pulsing"} } } } ``` ## Configuring a custom postingsformat It is possible the instantiate custom postingsformats. This can be specified via the index settings. ``` { "codec" : { "postings_format" : { "my_format" : { "type" : "pulsing40" "freq_cut_off" : "5" } } } } ``` In the above example the `freq_cut_off` is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary. Closes #2411	2012-11-14 23:54:29 +01:00
Igor Motov	120560bd0a	Using non-mapped fields in prefix queries shouldn't cause NullPointerException Fixes #2408	2012-11-14 18:34:54 +01:00
Igor Motov	f47d62cc30	Date fields shouldn't be returned as longs by Get API	2012-11-13 21:36:28 +01:00
Igor Motov	d1281d283b	Add `index.routing.allocation.require....` and `cluster.routing.allocation.require....` settings Fixes #2404	2012-11-13 19:29:20 +01:00
Martijn van Groningen	978c95649e	lucene 4: Fixed SimpleQueryTests	2012-11-12 13:44:42 +01:00
Martijn van Groningen	05746adeb2	lucene 4: Set number of replicas to 0. Makes the test run faster.	2012-11-12 13:44:42 +01:00
Igor Motov	c2f3eab7d3	lucene 4: fix sorting	2012-11-12 13:44:42 +01:00
uboness	cae66fb636	* lucene 4: added missing short support in stream input/output * lucene 4: added more extensive test for stored fields	2012-11-12 13:44:41 +01:00
Igor Motov	f8842d5a4f	lucene 4: fix TokenFilterTests	2012-11-12 13:44:41 +01:00
Shay Banon	9d5cae23fa	lucene 4: fix general mapping test no need to test for boost, we already have specific boost tests, in general, we should get rid of this test, and use more specialized tests if we are missing some	2012-11-12 13:44:41 +01:00
Shay Banon	5c45aad260	lucene 4: fix boost mapping tests	2012-11-12 13:44:41 +01:00
Igor Motov	3f3a95668b	lucene4: add support for omit_norm setting to numeric types and don't omit norms if boost is not 1.0 This commit enables setting boost for numeric fields. However, there is still no way to take advantage of boosted numeric fields during searching because all queries against numeric fields are translated into range queries wrapped in ConstantScore. Boost for numeric fields is broken on master as well https://gist.github.com/7ecedea4f6a5219efb89	2012-11-12 13:44:40 +01:00
Igor Motov	2fb3591792	lucene4: fixed default values tests to refer to correct default FieldType constants	2012-11-12 13:44:40 +01:00
Igor Motov	a5bef30be9	lucene4: fixed CompressIndexInputOutputTests	2012-11-12 13:44:40 +01:00
Igor Motov	3816366780	lucene4: fixed SimpleAllMapperTests	2012-11-12 13:44:40 +01:00
Shay Banon	a38064913f	lucene 4: fix engine tests	2012-11-12 13:44:40 +01:00
Igor Motov	bf13f3f81e	lucene4: fixed SimpleIndexQueryParserTests	2012-11-12 13:44:39 +01:00
Martijn van Groningen	db639e5c2e	lucene 4: Upgraded SimpleLuceneTests class. Test actually passes now.	2012-11-12 13:44:39 +01:00
Martijn van Groningen	2a8161d096	lucene 4: Upgraded SimpleLuceneTests class. The complete codebase compiles now!	2012-11-12 13:44:39 +01:00
Martijn van Groningen	aa2a8c66cc	lucene 4: Upgraded UidFieldTests class.	2012-11-12 13:44:39 +01:00
Martijn van Groningen	5c0ef796e8	lucene 4: Upgraded BoostMappingTests + SimpleMapperTests	2012-11-12 13:44:39 +01:00
Shay Banon	cefe2ba870	lucene 4: fix fuzzy query test	2012-11-12 13:44:39 +01:00
Igor Motov	787b7a3900	lucene4: more unit test cleanup	2012-11-12 13:44:37 +01:00
Igor Motov	5ad40205c2	lucene4: remove DocumentBuilder and FieldBuilder	2012-11-12 13:44:37 +01:00
Igor Motov	bb76542068	lucene4: unit tests cleanup	2012-11-12 13:44:37 +01:00
Igor Motov	6b4e483f55	lucene4: fixed unit.index.mapper, unit.index.query and unit.index.store test (with exception of document boost and similarity issues)	2012-11-12 13:44:37 +01:00
Igor Motov	6bbe37f876	lucene4: fixed integration tests that got broken by switch from String to Text in Facet terms	2012-11-12 13:44:36 +01:00
Shay Banon	f572a7bcf7	lucene 4: no close on searcher anymore	2012-11-12 13:44:31 +01:00
Martijn van Groningen	454954e7be	lucene 4: Fix field data, facets and field comparators	2012-11-12 13:44:31 +01:00
Shay Banon	a8e43578a2	Adding a type with _source or _all enabled fails, when these are disabled in index fixes #2394	2012-11-09 17:21:25 +01:00
Igor Motov	af1e8c0eb1	Add auto index creation on update request Fixes #2375	2012-11-02 10:18:51 -04:00
Aaron Dixon	bd9a5bfa0c	fixed issue2371 (incorrect behavior of path_match)	2012-11-01 22:11:33 +01:00
Chris Male	768b8b4d2b	Changed SpatialRelation contains to within	2012-11-01 22:03:47 +01:00
Igor Motov	23f7b0002a	Deleting a non-existent warmer shouldn't cause request to hang Fixes #2363	2012-11-01 21:49:54 +01:00
Igor Motov	a2628b5eb2	Upsert should return fields Fixes #2362	2012-11-01 21:43:23 +01:00
Igor Motov	29928a9e15	Add test for percolation with the _size field enabled	2012-10-23 23:08:52 +02:00
Igor Motov	c551f93cae	Add highlighter type switch	2012-10-23 02:25:55 +02:00
Shay Banon	04eabbd38a	Mapping: string mapping to automatically set omit_norms to true and index_options to docs when setting index to not_analyzed closes #2349	2012-10-22 19:01:01 +02:00
Martijn van Groningen	ee5df74a6b	Fixed delete by query issue with index aliases and nested mappings. The issue was that under these circumstances the delete by query operation would run forever. What also is fixed is that during shard recovery when delete by query is replayed nested docs are also deleted. Closes #2302	2012-10-20 23:39:15 +02:00
Shay Banon	246dc1d992	formatting	2012-10-19 09:48:48 +02:00
Simon Willnauer	b6a83fd8b2	#2332 support CustomScoreQuery highlighting and expose QueryBuilder on CustomScoreQuery via Java API	2012-10-19 09:23:16 +02:00
Martijn van Groningen	51e69e1a9e	Fixed NPE when using `has_parent` or `has_child` filter/query. The NPE occurred when for an arbitrary segment no parent documents exist for a has_parent filter/query and no child documents exist for a has_child filter/query. Closes #2297	2012-09-28 17:08:30 +02:00
Shay Banon	613c70c289	introduce TransportResponse a class that needs to be used when sending a response over the transport layer, with an option to have headers	2012-09-27 18:05:16 +02:00
Shay Banon	cfe7504d1c	introduce TransportRequest (with optional headers) introduce a new class, TransportRequest, which includes headers. This class can be used when sending requests over the transport layer, and ActionRequest also extends it now. This is the first phase of the refactoring part in the transport layer and action layer to allow for simpler implementations of those as well as simpler "filtering" capabilities in the future	2012-09-26 23:46:28 +02:00
Martijn van Groningen	81a6940ad3	Fixed score explain is for `custom_filters_score` query. Only the explain of the filter was included. This fix adds an explain for the inner query and wraps it in a top-level explanation.	2012-09-24 14:00:34 +02:00
Shay Banon	6e66f45f58	minor geo shape fetch improvements	2012-09-23 21:31:13 +02:00
Chris Male	05e0b4d4e0	Added ShapeFetchService with support in GeoShapeQueryParser/FilterParser	2012-09-23 21:24:13 +02:00
Chris Male	4f5e62e988	Added MultiPolygon parsing and serialization support	2012-09-21 14:03:21 +02:00
Martijn van Groningen	8080fdc509	Added types exists api The types exists api checks whether one or more types exists in one or more indices. ## Example usage curl -XHEAD 'localhost:9200/twitter/tweet' ## Options * `index` - One or more indices. Either specified as query string parameter or in the uri path. * `type` - One or more types. Either specified as query string parameter or in the uri path. * `ignore_missing` - Determines what type of indices to exclude from a request. The option can have the following values: `none` or `missing`. Closes #2273	2012-09-21 10:21:32 +02:00
Chris Male	6fc0b83e07	Upgraded to Spatial4j 0.3	2012-09-20 23:55:51 +02:00
Shay Banon	0fadbf2177	more easily add a field with boost to multi match builder	2012-09-20 12:38:13 +02:00
Shay Banon	83a39bd509	improve test by not waiting for green state, no need	2012-09-20 12:00:58 +02:00
Shay Banon	f8e1291243	global node indices level queries to be created by guice	2012-09-20 11:54:46 +02:00
Martijn van Groningen	d5aa35e0ea	Added better error handling for has_child, has_parent and top_children. If has_parent, has_child or top_children are executed incorrectly then a better exception is thrown. This gives a better error description when one of these queries or filters is being used in count api. Closes #2261	2012-09-18 13:26:23 +02:00
Shay Banon	7924115b90	Disable allocation: New indices allocation not to be disabled by default When setting cluster.routing.allocation.disable_allocation, it causes new indices primary shards to not be allocated. By default, new indices created should allow to, at the very least, allocate primary shards so they become operations. A new setting, cluster.routing.allocation.disable_new_allocation, allows to also disable "new" allocations. closes #2258.	2012-09-17 16:00:55 +02:00
Shay Banon	90e0a70e0e	cancel allocation command to allow_primary to be cancelled	2012-09-17 12:31:33 +02:00
Shay Banon	afca5ef15f	The reroute command allows to explcitiyly execute a cluster reroute allocation command including specific commands. For example, a shard can be moved from one node to another explicitly, an allocation can be canceled, or an unassigned shard can be explicitly allocated on a specific node. Here is a short example of how a simple reroute API call: curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "test", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "test", "shard" : 1, "node" : "node3"}} ] }' An importnat aspect to remember is the fact that once when an allocation occurs, the cluster will aim at rebalancing its state back to an even state. For example, if the allocation includes moving a shard from `node1` to `node2`, in an "even" state, then another shard will be moved from `node2` to `node1` to even things out. The cluster can be set to disable allocations, which means that only the explicitl allocations will be performed. Obviously, only once all commands has been applied, the cluster will aim to be rebalance its state. Anohter option is to run the commands in "dry_run" (as a URI flag, or in the request body). This will cause the commands to apply to the current cluster state, and reutrn the resulting cluster after the comamnds (and rebalancing) has been applied. The commands supporterd are: * `move`: Move a started shard from one node to anotehr node. Accepts `index` and `shard` for index name and shard number, `from_node` for the node to move the shard "from", and `to_node` for the node to move the shard to. * `cancel`: Cancel allocation of a shard (or recovery). Accepts `index` and `shard` for index name and shar number, and `node` for the node to cancel the shard allocation on. * `allocate`: Allocate an unassigned shard to a node. Accepts the `index` and `shard` for index name and shard number, and `node` to allocate the shard to. It also accepts `allow_primary` flag to explciitly specify that it is allowed to explciitly allocate a primary shard (might result in data loss). closes #2256	2012-09-15 22:56:14 +02:00
Martijn van Groningen	cfe76546f2	Added has_parent query The `has_parent` query works the same as the `has_parent` filter, by automatically wrapping the filter with a constant_score. It has the same syntax as the `has_parent` filter. Closes #2254	2012-09-14 17:34:01 +02:00
Martijn van Groningen	3cd54fc4ee	Improve `has_child` filter / query performance (#2251 ) Added a new has_child filter implementation, that works _uid based instead of bitset based. This implementation is about ~2 till ~6 times (depending on the query) faster than the already existing bitset based implementation.	2012-09-14 14:37:29 +02:00
Shay Banon	ef9974ce2c	add serialization options for allocation commands	2012-09-14 14:26:39 +02:00
Martijn van Groningen	2bd9b3aed0	Added `has_parent` filter (#2243 ) The `has_parent` filter accepts a query and a parent type. The query is executed in the parent document space, which is specified by the parent type. This filter return child documents which associated parents have matched. For the rest `has_parent` filter has the same options and works in the same manner as the `has_child` filter. This is an experimental filter. Filter example ################### ``` { "has_parent" : { "parent_type" : "blog" "query" : { "term" : { "tag" : "something" } } } } ``` The `parent_type` field name can also be abbreviated to `type`. Memory considerations ############### With the current implementation, all _id values are loaded to memory (heap) in order to support fast lookups, so make sure there is enough mem for it. This issue originates from issue #792	2012-09-13 13:35:45 +02:00
Shay Banon	e530f03b94	internal custom allocation commands add support for internal custom allocation commands, including allocation, move, and cancel (shard). also, fix #2242, which causes the cluster state to be in inconsistent state when a shard being the source of relocation is failed	2012-09-12 15:13:27 +02:00
Martijn van Groningen	b6a9bd9a31	- Fixed boosting per field with multi_match query.	2012-09-12 11:43:33 +02:00
Shay Banon	24ce2ef537	handle EOF when handling arrays as well	2012-09-05 11:40:39 +02:00
Shay Banon	a42159f8d5	Shard Allocation: `index.routing.allocation....` settings do not "remove" the setting on empty string, closes #2229 .	2012-09-03 16:44:23 +02:00
Martijn van Groningen	9b29950997	Added fields option to explain api. #2203	2012-08-31 22:19:09 +02:00
Martijn van Groningen	cd0e1226e1	Added a global ignore_malformed index setting. #2220 Also extended the ignore_malformed support to TTL, Ip and timestamp field types.	2012-08-31 22:10:46 +02:00
Martijn van Groningen	dea2de3304	Add ignore_indices option for search, multisearch, count and other Broadcast request. #2209	2012-08-27 15:36:14 +03:00
Martijn van Groningen	1d4aee6086	- Explain api opens 2 engine searchers, but closes only 1 engine searcher. Closes #2206	2012-08-27 12:20:02 +03:00
Martijn van Groningen	bbe735f2cc	Fixed issue #2197	2012-08-25 00:38:26 +03:00
uboness	b4b33bb205	Local node master listener * Fixed an issue where dynamic update to minimum_master_nodes settings would not take immediate effect * Added LocalNodeMasterListener support to the ClusterService. Enables listening to when the local node becomes/stopped being a master	2012-08-24 02:25:13 +02:00
uboness	3fdb9f0a27	Enabled the option of configuring plugin types in the settings. This will also help in tests when testing plugin related functionality	2012-08-21 23:00:24 +02:00
Martijn van Groningen	8365e7ba0b	- Added explain api. #2184	2012-08-21 13:26:17 +02:00
Shay Banon	9aae62b4a6	All Field: Automatically detect when field level boosting is used, and optimize when its not, closes #2189 .	2012-08-20 15:07:32 +02:00
Shay Banon	e3a9271000	unify more count and search implementation	2012-08-19 16:54:53 +02:00
Simon Willnauer	b0b5775c98	use term query instead of a specialized SpanTermQuery on _all field if positions are omitted	2012-08-16 10:42:14 -07:00
Shay Banon	ab49a8c2fc	improve update test to wait for green cluster state	2012-08-14 01:47:18 +02:00
Simon Willnauer	53f65d8ff2	Remove / deprecated omit_term_freq_and_positions in favor of IndexOptions exposed via mapping API	2012-08-13 17:19:08 +02:00
Shay Banon	eda3da2aea	fix geo shape tests	2012-08-13 14:40:36 +02:00
Chris Male	bea4346f3a	Added GeoShape indexing and querying support	2012-08-13 13:44:29 +02:00
Martijn van Groningen	b979dfa0be	Add lenient option to match & multi_match queries. #2156	2012-08-09 21:56:50 +02:00
Shay Banon	fedd1965ea	Update API: Update through an alias with routing configured on it fail to use the routing, closes #2155 .	2012-08-09 15:14:52 +02:00
Martijn van Groningen	e43dd4687e	- Added support for multi match query.	2012-08-09 11:36:59 +02:00
Martijn van Groningen	195e586fd8	- Fixed timezone parsing when input starts with '+'sign. Fixes issue #2141	2012-08-07 22:53:00 +02:00
Martijn van Groningen	37e7a54b0e	Fixed top children query bug reported in issue #2140 Fixed type.	2012-08-06 22:02:03 +02:00
Martijn van Groningen	0e3c825501	Added ignore_malformed mapping parameter for all number like types. Issue #2120	2012-08-03 10:41:07 +03:00
Shay Banon	7a0d7f531d	fix test	2012-08-02 09:40:54 +03:00
Shay Banon	e88dbafe51	rename Test to Tests, so it will be executed as part of the mvn tests as well, reformat a bit	2012-08-01 16:20:37 +03:00
Simon Willnauer	d13a7809d1	#2116 Expose all ShingleFilter settings via ShingleTokenFilterFactory	2012-08-01 16:18:58 +03:00
Shay Banon	0492d9b8cb	fix test failure message...	2012-07-31 21:02:34 +02:00
Shay Banon	82cfe0e8b2	upgrade to latest testng, improve console output when running test, add more options as env vars when using maven	2012-07-31 20:24:39 +02:00
Shay Banon	bbc45fefe5	rename limit to ignore_above, and create a dedicated test	2012-07-31 13:00:10 +02:00
Martijn van Groningen	41b3a454cf	Issue #2121 Added limit parameter for string type.	2012-07-31 13:00:03 +02:00
Shay Banon	4eb85bbbd6	Transport/Http: Remove explicit setting of send/receive buffer, and improve netty receive buffer predictor, closes #2124 .	2012-07-30 21:37:38 +02:00
Shay Banon	7edafcf9a0	Node Stats: Add jvm buffer pools stats (when available, for java 7 and above), closes #2122 .	2012-07-29 00:49:18 +02:00
Shay Banon	57e966e9d7	upgrade to jackson 2.0.4	2012-07-10 23:44:02 +02:00
Shay Banon	99d2f27c84	Introduce Text abstraction, allowing for improved representation of strings, apply to HighlightedField (breaks backward for Java API from String to Text), closes #2093 . By introducing the Text abstraction, we can keep (long) text fields in their UTF8 bytes format, and no need to convert them to a string when serializing it back to Json for example. The first place we can apply this is to highlighted text, which can be long.. . This does breaks backward comp. for people using the Java API where the HighlightField now has a Text as its content, and not String.	2012-07-10 00:47:37 +02:00
Shay Banon	35233564fd	buffer management refactoring First phase at improving buffer management and reducing even further buffer copies. Introduce a BytesReference abstraction, allowing to more easily slice and "read/write references" from streams. This is the foundation for later using it to create smarter buffers on top of composite netty channels for example (which http now produces) as well as reducing buffer copies when sending transport/rest responses.	2012-07-07 01:26:41 +02:00
Shay Banon	8d1e04a973	have the quick rolling restart stress test also wait for 0 relocating shards	2012-07-06 01:01:18 +02:00
Shay Banon	57023c8ba9	Compression: Support snappy as a compression option, closes #2081 .	2012-07-04 17:14:12 +02:00
Shay Banon	e5c89def42	Support wildcard and +/- notation for multi index APIs, closes #2074 .	2012-07-01 18:16:04 +02:00
Shay Banon	565db26e13	Store Compression: integer overflow causes failed reads (index is safe), closes #2071 .	2012-06-30 01:37:46 +02:00
Shay Banon	8bab859822	simplify tests, doc file length	2012-06-29 16:01:17 +02:00
Shay Banon	f2e39e4ee2	Auto import dangling indices, closes #2067 .	2012-06-29 01:01:26 +02:00
Shay Banon	a872c88f03	dangling index handling might still remove the state files for the dangling index, closes #2065 .	2012-06-28 13:32:44 +02:00
Matt Weber	d6bc17fee5	Partial update without script Allow the use of "doc" as the update source when a script is not specified. New fields are added, existing fields are overwritten, and maps are merged recursively.	2012-06-27 21:29:22 +02:00
Igor Motov	a4ad84b5e4	Enable validation of queries with has_child and script filters	2012-06-27 21:23:02 +02:00
Igor Motov	dbeda1ab2b	Add missing serialization for error and explanation in validate query request	2012-06-27 21:23:01 +02:00
Shay Banon	2b893fe1e5	Use bloom filter when flushing (applying deletes), closes #2058 .	2012-06-26 16:45:29 +02:00
Shay Banon	12a644c89b	Stored Compression: failure to fetch document in certain cases (read failure, index compression works), closes #2055 .	2012-06-26 01:54:19 +02:00
Shay Banon	6e7764a083	reduce objects created with bloom filter operations	2012-06-24 20:58:44 +02:00
Shay Banon	2fb867b467	Store Compression: Term Vector Vector, closes #2049 .	2012-06-23 23:11:00 +02:00
Shay Banon	6fb836c25e	better thread naming	2012-06-23 18:35:42 +02:00
Shay Banon	1780a2a067	Failure to recover properly on node(s) restart When a node restarts, it might be canceling one recovery of a shard id only to get another one in the next cycle. We should detect this case and handle it properly. This is a fix to the annoying message seen by users: suspect illegal state: trying to move shard from primary mode to replica mode.	2012-06-22 17:46:57 +02:00
Shay Banon	cc3fab45ff	Improve cluster resiliency to disconnected sub clusters + fix a shard allocation bug with quick rolling restarts Two main changes: Improve cluster resiliency to disconnected sub clusters. If a node pings a master and that node is no longer registered with the master, improve the rejoin process of that node to the cluster. Also, if a master receives a message from another master, pick one to force to rejoin the cluster (based on cluster state versioning). On quick rolling restart, without waiting for shard allocation, the shard allocation logic can mess up its counts, causing for strange logic in allocating shards, or validation failures on routing table allocation.	2012-06-22 03:36:54 +02:00
Shay Banon	b009c9c652	Stored Fields Compression, closes #2037 . Compressing the stored fields file (the .fdt file) directly allows to have better compression on the size of the index, specifically when indexing (and storing) small documents. The compression will be considerably more effective compared to compressing each doc on its own (when setting compress on the _source mapper). The downside is that more data needs to be uncompressed when loading documents. The settings to control it is `index.store.compress.stored_fields` set to `true` (it defaults to `false`), and can be enabled dynamically using the update settings API. This allows to enabled compression at a later stage (i.e. old time based indices), and then optimize the index to make sure it gets compressed.	2012-06-20 05:31:34 +02:00
Shay Banon	fbf4c70af9	add simple compression bench	2012-06-19 13:15:44 +02:00
Martijn van Groningen	d66f401ce6	Better fix for mv field highlighting issue #1994	2012-06-19 04:13:47 +02:00
Shay Banon	aebd27afbd	abstract compression abstract the LZF compression into a compress package allowing for different implementation in the future	2012-06-19 04:07:11 +02:00
Shay Banon	1a98a9184e	fix test to shutdown threadpool	2012-06-19 03:37:08 +02:00
Shay Banon	7b3b130a62	fix tests to shutdown threadpool	2012-06-19 03:33:44 +02:00
Chris Male	040fa2581a	Added GeoDistance test which verifies the difference in behaviour between ARC and PLANE, causing elliptical results	2012-06-15 22:55:45 +02:00
Shay Banon	982c8b4664	fix test to work with new noramalization	2012-06-14 15:55:33 +02:00
Chris Male	2315e6d239	Incorporated changes to normalization of latitude and longitudes so latitude normalization is correct and longitude is normalized at the same time	2012-06-14 15:43:36 +02:00
Shay Banon	133bd72f8d	Multi Search API: Allow to set search_type on REST endpoint URI to apply to all search requests, closes #2023 .	2012-06-13 20:47:24 +02:00
Shay Banon	dfe6e58e37	use an array to represent the keys in the uid filter	2012-06-13 16:03:45 +02:00
Shay Banon	6eb419649a	better/faster parsing of update request (with upsert)	2012-06-13 13:12:37 +02:00
Shay Banon	0b4fe4add3	rename doc to upsert in update API a better descriptive name for it, and won't clash with future features on the update api	2012-06-13 12:42:10 +02:00
Martijn van Groningen	1319ed9322	Fixes highlight issue for multivalues fields described in issue #1994	2012-06-11 23:44:45 +02:00
Shay Banon	9905eab73a	Update API: Allow to upsert, provide a doc and index it if the doc does not exists, closes #2008 .	2012-06-08 02:01:04 +02:00
Shay Banon	ccea825966	terms filter uses less memory when cached move from a TreeSet to an array, sorting on creation	2012-06-07 23:34:21 +02:00
Shay Banon	f87632fabd	Query DSL: term/terms filter performance improvement (bulk reading), closes #1972 .	2012-05-23 21:54:31 +02:00
Shay Banon	2c274e59d5	Percolator: Registering (indexing) a new percolator query will still be stored in memory if actually indexing it fails, closes #1965 .	2012-05-19 19:36:01 +02:00
Shay Banon	f0007fd4ae	Create Index: Allow to provide index warmers when creating an index, closes #1917 .	2012-05-07 14:27:30 +03:00
Shay Banon	ca2dc1801c	Index Template: Allow to register index warmers in an index template, closes #1916 .	2012-05-07 14:00:37 +03:00
Shay Banon	e0f3b7e885	Index Warmup API, closes #1913 .	2012-05-06 18:50:35 +03:00
Shay Banon	aeae380258	ClassCastException during percolation query, closes #1905 .	2012-05-03 17:57:52 +03:00
Shay Banon	07f3ed05b0	Search Preference: Add _shards prefix to explicitly list shards, and add _prefer_node option, closes #1904	2012-05-03 01:12:22 +03:00
Shay Banon	8ca36c8dd5	allow internally to register index warmup actions, as well as expose stats on it	2012-04-29 00:37:20 +03:00
Shay Banon	a4fb33dbc3	Date Histogram Facet: Add `quarter` as an interval, closes #1884 .	2012-04-24 19:04:09 +03:00
Shay Banon	98b1f368f5	Better handling of fields that have `.` in their name when doing property based navigation, closes #1875 .	2012-04-19 17:28:14 +03:00
Shay Banon	03c9eaf812	NullPointerException in geo_distance_range without to, closes #1865 .	2012-04-17 15:51:45 +03:00
Shay Banon	16cd159a38	Upgrade to Lucene 3.6, closes #1862 .	2012-04-15 17:39:41 +03:00
Shay Banon	b78680c7ae	Java API Query DSL: Add wrapper filter similar to wrapper query accepting a json filter in raw format, closes #1844 .	2012-04-04 19:53:17 +03:00
Shay Banon	cdfa87827a	Update API: Allow to specify fields in the request to return updated fields, closes #1838 .	2012-04-03 14:11:22 +03:00
Benjamin Devèze	0cf0703a7b	add fields parameter for update API (#1822 )	2012-04-03 13:35:12 +03:00
Shay Banon	9fb6ecf9f0	allow to more easily plug custom unicast host providers by being able to add them to ZenDiscoveryModule using a plugin	2012-03-31 21:38:39 +03:00
Igor Motov	8859594e36	add extended validation information	2012-03-24 13:40:25 +02:00
Shay Banon	348ed11450	Have streams provided to gateway (shared one) allow marking, closes #1803 .	2012-03-22 12:20:00 +02:00
Shay Banon	752ae6e206	optimize acquiring search handler to use a search manager, also, creating a ContextIndexSearcher can be optimized if it is created from a searcher	2012-03-09 22:41:09 +02:00
Shay Banon	c08b968246	rename the cached thread pool to generic (from cached), since really, cached is meaningless, and its actually a generic thread pool we use for different operations	2012-03-09 20:32:33 +02:00
Shay Banon	e707e93942	Index Blocks: Add index.blocks.write, index.blocks.read, and index.blocks.metadata settings, closes #1771 .	2012-03-08 21:56:13 +02:00
Shay Banon	5b76222ee7	Merge branch 'create-post-bug' of https://github.com/Paikan/elasticsearch	2012-03-01 14:54:19 +02:00
Shay Banon	feaccee246	Multi level parent/child mapping and search fails, closes #1751 .	2012-03-01 14:23:58 +02:00
Benjamin Devèze	7231ee832a	set missing create param in PutRequest	2012-02-29 17:56:53 +01:00
Shay Banon	c72772e621	msearch should accept a leading \n, closes #1736 .	2012-02-27 00:27:43 +02:00
Shay Banon	9d724b8a14	fix test	2012-02-21 13:44:31 +02:00
Shay Banon	0bf61ab6c8	add pre/post zone, pre/post offset, and factor to date histogram builder	2012-02-21 12:43:28 +02:00
Shay Banon	c6130b95e5	allow to provide no header (but still \n) for msearch	2012-02-20 22:00:43 +02:00
Shay Banon	4a9cb6408c	API: Multi Search, closes #1722 .	2012-02-20 18:57:27 +02:00
Benjamin Devèze	36a4cde89f	add update integration tests	2012-02-17 23:09:52 +01:00
Shay Banon	7bd87e12a2	Indices query should accept alias names, closes #1698 .	2012-02-17 15:03:52 +02:00
Shay Banon	f997315f54	Date Mapping: Support "date math" when searching, closes #1708 .	2012-02-16 18:10:12 +02:00
Shay Banon	278e5d3a43	Transport buffer overrun can happen because of byte buffer reading optimization introduced in 0.19.0.RC1, closes #1686 .	2012-02-09 00:15:08 +02:00
Shay Banon	457f0a4266	Avoid placing a shard replica on the same machine as shard itself, closes #1680 .	2012-02-08 15:39:01 +02:00
Shay Banon	a5838dc403	improve test, wait for green state post master node startup	2012-02-01 21:17:45 +02:00
Shay Banon	f6deb45970	Cluster Allocation: cluster.routing.allocation.allow_rebalance does not allow for rebalancing on relocating shard, closes #1651 .	2012-01-30 01:58:51 +02:00
Shay Banon	70c334ec01	Index Allocation: allow to specify maximum total number of shards per node, closes #1650 .	2012-01-30 01:43:18 +02:00
Shay Banon	49b6d70dfd	Query DSL: prefix query to support _id, closes #1648 .	2012-01-29 21:09:11 +02:00
Shay Banon	bb6fb6e083	improve test to wait for 2 nodes	2012-01-28 00:26:53 +02:00
Shay Banon	da433df217	Mapping: _source mapping to allow for format to convert to (if needed), closes #1639 .	2012-01-26 00:18:46 +02:00
Shay Banon	68bb5d1434	by default, index metadata to be stored in smile format and store binary format mapping and alias filter to improve the cost it takes to persist them	2012-01-25 11:58:29 +02:00
Shay Banon	c1a2a5c910	close the multicast socket in test	2012-01-24 13:10:38 +02:00
Shay Banon	1b7d329307	add a local gateway test to make sure we recover also latest state when updating index metadata and templates	2012-01-23 00:50:32 +02:00
Shay Banon	942b427940	Local Gateway: Store specific index metadata under dedicated index locations, closes #1631 .	2012-01-22 23:34:34 +02:00
Shay Banon	534f487de3	Local Gateway: Move shard state to be stored under each shard, and not globally under _state, closes #1618 .	2012-01-18 01:08:35 +02:00
Shay Banon	801c709b42	test with local gateway	2012-01-18 01:02:55 +02:00
Benjamin Devèze	0810808864	fix bug in TTL handling where default TTL value was not set properly	2012-01-17 10:35:16 +01:00
Shay Banon	bddea09170	/_status doc count of index wrong, closes #1615 .	2012-01-16 13:48:31 +02:00
Shay Banon	21405f5aa4	Highlighting: Add boundary_chars and boundary_max_size to control text boundaries with fast vector highlighter (term vector), closes #1614 .	2012-01-15 23:05:34 +02:00
Shay Banon	e37c0904f0	Add generic execution of APIs to Client (and indices/cluster) and allow for plugins to register custom APIs, closes #1612 .	2012-01-15 16:15:09 +02:00
Shay Banon	8ee6ee05cd	Java API: Move all request builders to org.elasticsearch.action... from org.elasticsearch.client.action, closes #1611 .	2012-01-15 12:44:50 +02:00
Shay Banon	d2d65f2f65	add test marker on the class as well	2012-01-12 16:59:43 +02:00
Olivier Favre	8f0ecbcc0b	Improve latitude and longitude normalization	2012-01-12 16:58:44 +02:00
Shay Banon	04a138db5d	Allow to provide timeout parameter in request body (as well as URI parameter), closes #1604 .	2012-01-12 14:19:21 +02:00
Shay Banon	771dbdb4bc	doc nested docs and get / uid	2012-01-11 15:01:40 +02:00
Shay Banon	5b2854e8bb	Date Histogram Facet: Add `pre_offset` and `post_offset` options, closes #1599 .	2012-01-09 21:28:56 +02:00
Shay Banon	d149cbb06e	query builder builds a "safe" byte array	2012-01-09 00:17:53 +02:00
Shay Banon	0f1b3f0457	delete by query to use byte reference serialization	2012-01-08 20:52:48 +02:00
Shay Banon	858195351b	translog actions to use bytes ref serialization, and have the option to mark BytesStreamInput as unsafe	2012-01-08 17:23:37 +02:00
Shay Banon	45b5594e9b	sleep before checking for no master block	2012-01-08 12:17:53 +02:00
Shay Banon	e059e213db	removed phonetic, fix test config files	2012-01-08 12:06:30 +02:00
Shay Banon	3d51553cf2	Move phonetic token filter to a plugin, closes #1594 .	2012-01-07 23:18:30 +02:00
Shay Banon	aec5af3800	clean more test yml files	2012-01-07 00:08:09 +02:00
Shay Banon	164df9979a	remove yml file conf for test	2012-01-06 23:43:35 +02:00
Shay Banon	5c7d1d0984	remove yml file conf for test	2012-01-06 23:41:28 +02:00
Shay Banon	ec8b7c3e23	No master (startup / minimum_master_node) / not recovered blocks should cause proper failures on operations, closes #1589 .	2012-01-06 23:38:41 +02:00
Shay Banon	a18021c778	Filter cache to have just weighted (node) and none, and index query parser cache to be size based, closes #1590 .	2012-01-05 20:44:09 +02:00
Benjamin Devèze	d95aa9f266	add ttl tests with routing	2012-01-04 23:37:34 +01:00
Shay Banon	e5f2ce0fd6	use factor in scripts, so custom score function will work correctly when it multiplies	2012-01-04 21:53:26 +02:00
Shay Banon	761862a9a9	nicer exception names	2012-01-03 01:05:08 +02:00
Shay Banon	83d5084f62	Update API: Allow to update a document based on a script, closes #1583 .	2012-01-02 22:02:19 +02:00
Shay Banon	8c6b2a3077	Date Histogram Facet: Improve time zone handling, add factor option, closes #1580 .	2012-01-01 00:09:57 +02:00
Shay Banon	8cf8b478af	Scan Search: Improve performance while scrolling through it, closes #1579 .	2011-12-31 17:49:19 +02:00
Shay Banon	e47ec96ca2	Merge branch 'master' of https://github.com/dakrone/elasticsearch	2011-12-29 14:17:50 +02:00
Lee Hinman	f6b036f713	Refactor validate to validateQuery and move into indices admin action	2011-12-28 15:27:59 -07:00
Shay Banon	4e6217c54d	simplify toString for cached filter	2011-12-28 23:35:04 +02:00
Lee Hinman	be6e18cb36	Add query validation feature	2011-12-27 13:51:59 -07:00
Shay Banon	5049f60b6c	Set an index / indices to read only, or make the cluster read only, closes #1573 .	2011-12-27 20:35:07 +02:00
bbgordonn	661d04e9de	#1452 closed: block writes or metadata changes if {index,cluster}.read_only is set.	2011-12-27 17:19:03 +02:00
Shay Banon	cc3f44473f	Search: Support partial fields that can returns partial view of the _source, closes #1570 .	2011-12-26 16:49:55 +02:00
Shay Banon	aa078788f9	Nested objects not deleted on "delete by query", closes #1537 .	2011-12-25 13:33:02 +02:00
Shay Banon	73b74847aa	cleanup test	2011-12-22 23:24:12 +02:00
jayson.minard	52e6327467	unit tests for issue 1560, customfiltersscore min and multiply search modes	2011-12-22 23:19:58 +02:00
Shay Banon	415ee6425a	Allow search to continue when sort field is missing from type mapping, closes #1558 .	2011-12-22 14:25:54 +02:00
Shay Banon	fe4ba2ad55	Improve multi field mapper with highlighting based on source, closes #1559 .	2011-12-22 02:24:36 +02:00
Shay Banon	52743a05fa	rename setEncoder to setHighlighterEncoder, its not evident which encoder is refers to	2011-12-21 23:36:16 +02:00
Shay Banon	55d8d0d9c6	Analyze API: Allow to execute it without pre-creating an index, and allow to build custom analyzer (tokenizer + token_filters), closes #1555 .	2011-12-21 23:24:55 +02:00
Shay Banon	2b838b808e	add another path trie test for wildcard vs. contant	2011-12-20 17:42:06 +02:00
Shay Banon	dd6c076454	simplify and improve scaling/blocking thread pools	2011-12-20 12:03:28 +02:00
Shay Banon	41b5c3d562	wait for yellow state in test	2011-12-19 13:51:33 +02:00
Shay Banon	0328a300eb	move jmeter files under jmeter, no need for jmx	2011-12-18 01:01:12 +02:00
Shay Banon	a3ca1afed5	Translog: When not sync'ing on each operation, buffer writes, closes #1549 .	2011-12-18 00:19:35 +02:00
Shay Banon	ec04435b06	rename test	2011-12-16 23:36:31 +02:00
Shay Banon	26fc9bcb25	abstract away the fs translog file to an interface	2011-12-16 23:30:31 +02:00
Shay Banon	922833cdc4	source not returned when * specified in fields list, closes #1541 .	2011-12-14 21:18:14 +02:00
Shay Banon	de861d6f43	Support Multicast discovery for external clients, closes #1532 .	2011-12-11 18:54:07 +02:00
Shay Banon	e7eed3c182	fix package location of trove extensions	2011-12-10 00:12:42 +02:00
Shay Banon	a71f2eed99	fix test to wait for async indexing to finish	2011-12-09 10:04:11 +02:00
Shay Banon	1fd5a48409	wait for yellow status before searching	2011-12-08 16:28:57 +02:00
Shay Banon	5ea6c0bac5	wait for green status in test to make sure shards are allocated	2011-12-07 17:01:57 +02:00
Shay Banon	9781d8675d	cleanups, remove unused code	2011-12-06 16:40:07 +02:00
Shay Banon	6a71eab51f	finalize structure, tests pass	2011-12-06 02:43:17 +02:00
Shay Banon	a8fd2d48b8	first cleanup phase, move to single src	2011-12-06 00:59:23 +02:00

... 70 71 72 73 74 ...

4053 Commits