OpenSearch

Commit Graph

Author	SHA1	Message	Date
Martijn van Groningen	c21ab1a9cf	Return proper response code for delete by query api in the case of failures. Closes #2963	2013-05-01 11:53:40 +02:00
Igor Motov	6437c51501	Improve stability of SimpleRecoveryLocalGatewayTests Fixed testX and testSingleNodeNoFlush by specifying mapping on index creation instead of using dynamic mapping. Dynamic mapping is updated on the cluster level asynchronously and if mapping changes are not applied to the cluster state before node is closed, these changes are not be available after node restart. While data added in the test is preserved, due to absence of mapping, the test still fails. This is a known issue that we are not planning to fix at the moment.	2013-04-30 12:11:30 -04:00
Alexander Reelsen	a694e97ab9	Support source include/exclude for realtime GET Currently realtime GET does not take source includes/excludes into account. This patch adds support for the source field mapper includes/excludes when getting an entry from the transaction log. Even though it introduces a slight performance penalty, it now adheres to the defined configuration instead of returning all source data when a realtime get is done.	2013-04-30 17:48:03 +02:00
Alexander Reelsen	d5f4c8230d	XContentMapValues.filter now works with nested arrays The filter method of XContentMapValues actually filtered out nested arrays/lists completely due to a bug in the filter method, which threw away all data inside of such an array. Closes #2944 This bug was a follow up problem, because of the filtering of nested arrays in case source exclusion was configured.	2013-04-30 17:33:09 +02:00
Simon Willnauer	773ea0306b	Fail will IAE if a numeric field is used for the anaysis endpoint. Analysing a numeric field will return UTF-16 representations of of Lucenes numeric prefix terms. Those terms are meaningless in general unless used for lookups in the lucene index. Passing a numeric field to the analysis action is most likely a bug. Closes #2953 #2952	2013-04-30 16:07:11 +02:00
Simon Willnauer	8c6ba59b83	Upgrade Lucene Version to 4.2. The latest Elasticsearch version must use the latest Lucene version as specified in o.e.common.lucene.Lucene and must be upgraded with each lucene release. This commit adds an assert that fails once the actual lucene version that is used is higher than the current releases version.	2013-04-30 14:06:57 +02:00
Simon Willnauer	42b9674d0c	added simple test for numeric match query	2013-04-30 13:53:49 +02:00
Shay Banon	6c3bb4dcdd	move to 1.0.0.Beta1 snap	2013-04-29 13:51:09 +02:00
Shay Banon	cb75ce0caa	release 0.90.0 GA	2013-04-29 13:41:43 +02:00
Shay Banon	9ded2405a0	Use Lucene Version that was used to create the index in Analysis Lucene ships with a version constant that is mainly used to provide consistent behaviour across lucene release versions. Lucene's Analysis capabilities are commonly applied at index and search time such that the search-time behaviour should be identical to the index-time behaviour in most of the cases. Currently ElasticSearch always uses the latest version from Lucene which can break backwards compatibility with the index for users that rely on behaviour that changed in new Lucene version. Users should always use the version the index was created with unless it's explicitly configured. closes #2945	2013-04-29 13:18:51 +02:00
Simon Willnauer	bd7ff6946e	Added X Versions of NGramTokenFilter and NGramTokenizer to ElasticSearch. These versions don't produce broken positions anymore and prevent certain highlighter bugs that fail with StringArrayOutOfBoundsExceptions as in #2931 This commit breaks backwards compatibility in terms of highlighting when NGramTokenFilter is used. The highlighter will highlight the entire terms as produced by the tokenizer instead of the individual sub-gram. To do sub-gram highlighting, the ngram tokenizer should be used. This behavior was based on broken NGramTokenFilter behavior which will be fixed in Lucene 4.4 but was ported in this commit to elasticsearch 0.90. The broken behavior can still be used if a version < LUCENE_42 is used in the token filter mapping. Closes #2931	2013-04-27 16:48:25 +02:00
Shay Banon	f09ad507a4	open context stats - rename to open_contexts from open, we might have other open stats in the future related to search (lucene index searchers?) - add a test to verify it works	2013-04-27 15:09:47 +02:00
Simon Willnauer	8a7f81104f	Remove XSimpleFragmentsBuilder and XScoreOrderFragmentsBuilder since the only difference to the lucene version is that `discreteMultiValueHighlighting` does default to `true`. Yet we set this anyway in the HighlightingPhase such that the classes are obsolet.	2013-04-26 20:04:38 +02:00
Simon Willnauer	355f80adc9	Added temporary fix for LUCENE-4899 where FastVectorHighlihgter failed with StringIndexOutOfBoundsException if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets The fixed BaseFragListBuilder was added as XSimpleFragListBuilder which triggers an assert once Elasticsearch upgrades to Lucene 4.3	2013-04-26 19:48:48 +02:00
Simon Willnauer	2ed2fab904	Add assert that fails one Elasticsearch upgrades to Lucene 4.3 in order to remove the duplicated class	2013-04-26 19:16:21 +02:00
Alexander Reelsen	90353ceb79	Fixing possible NoClassDefFoundError when trying to load nonexisting classes In order to handle exceptions correctly, when classes are not found, one needs to handle ClassNotFoundException as well as NoClassDefFoundError in order to be sure to have caught every possible case. We did not cater for the latter in ImmutableSettings yet. This fix is just executing the same logic for both exceptions instead of simply bubbling up NoClassDefFoundError.	2013-04-26 10:34:10 +02:00
Alexander Reelsen	22e25cc165	Added stolen time to OsStats output	2013-04-25 10:46:24 +02:00
Shay Banon	c4968d7d65	no longer support snappy...	2013-04-25 09:38:58 +02:00
Igor Motov	982b570037	Fix serialization of sync/async replication type	2013-04-25 08:25:31 +02:00
Martijn van Groningen	dd12e0b86c	If searchContext not set, abort parsing and throw ISE	2013-04-24 10:24:15 +02:00
Simon Willnauer	c884304753	Fall back to local statistics if global statistics are not availalbe for a field or term Closes #2926	2013-04-23 13:32:35 +02:00
Simon Willnauer	f372f7c109	Cut over StringScriptDataComparator to use BytesRef instead of Strings Closes #2920	2013-04-23 13:29:19 +02:00
Simon Willnauer	7a36bed031	Remove per-doc ord collector callback infavor of an iterator	2013-04-23 10:35:40 +02:00
Martijn van Groningen	c390f9b1a9	Added more test assertions	2013-04-19 22:16:42 +02:00
Simon Willnauer	7ea6cd6888	use Double/Float.compare for stable and correct float sort order	2013-04-19 21:40:01 +02:00
Clinton Gormley	1483a3a0e5	Added tests for multi_match with minimum_should_match	2013-04-19 21:40:01 +02:00
Clinton Gormley	e508b27203	Apply minimum_should_match to inner clauses of multi_match query When specifying minimum_should_match in a multi_match query it was being applied to the outer bool query instead of to each of the inner field-specific bool queries. Closes #2918	2013-04-19 21:39:54 +02:00
Simon Willnauer	3ab56e16b7	Support empty string in FSTBytesAtomicFieldData	2013-04-19 12:49:06 +02:00
Simon Willnauer	a1c62759c9	remove size bound from cache recycler for performance reasons	2013-04-19 12:36:12 +02:00
Simon Willnauer	2d13aa29f8	s/ES.RECYCLE/es.cache.recycle	2013-04-19 11:48:28 +02:00
Simon Willnauer	05b6c46bec	allow CacheRecycler to be cleared via the REST API	2013-04-19 11:45:33 +02:00
Simon Willnauer	79db1bfbf0	make object caching optional	2013-04-18 19:14:19 +02:00
Florian Schilling	54cb4b9615	# Response for Cluster Settings Update API If cluster settings are update the REST API returns the accepted values. For example, updating the `cluster.routing.allocation.disable_allocation` via cluster settings: ```curl -XPUT http://localhost:9200/_cluster/settings -d '{ "transient":{ "cluster.routing.allocation.disable_allocation":"true" } }'``` will respond: ```{ "persistent":{}, "transient":{ "cluster.routing.allocation.disable_allocation":"true" } }``` Closes #2907	2013-04-18 11:34:58 +02:00
Lucas Ward	99c101c37e	If a value/field is a Calendar, it will be converted to a Date using getTime() Closes #2911	2013-04-18 10:57:08 +02:00
Shay Banon	0eb298fe64	use more aggressive concurrency levels for CHM - long running ones with high update rates - also expose a system property of es.useConcurrentHashMapV8 to use the new non blocking Java8 CHM impl	2013-04-17 14:28:38 -07:00
Shay Banon	271305d5eb	Search Stats: Add current open searches closes #2906	2013-04-16 18:08:57 -07:00
Simon Willnauer	efc9e8fe7b	only return primary if it is active in PlainOperationRounting Closes #2896	2013-04-16 17:20:22 +02:00
Martijn van Groningen	bcc16654d2	Better error messaging when postings_format can be resolved or when a custom postings_format type can't be instantiated. Relates to #2893	2013-04-16 16:29:54 +02:00
Martijn van Groningen	9a1c03408b	Added support for the `_cache` and` _cache_key` options to the `has_child` and `has_parent` filters. Closes #2900	2013-04-16 14:42:45 +02:00
Florian Schilling	ef5b7412e6	Allow PolygonBuilder to create polygons with hole Closes #2899	2013-04-16 11:22:48 +02:00
Simon Willnauer	30f9f278c3	Added UNICODE_CHARACTER_CLASS support to Regex flags. This flag is only supported in Java7 and is ignored if set on a java 6 JVM Closes #2895	2013-04-16 10:06:53 +02:00
uboness	eb21526552	Added missing support for lat, lats, lon, lons for doc notation in scripts	2013-04-13 13:58:30 -07:00
uboness	20e6df9f34	Optimization in fielddata cache where ordinals are used instead of flat arrays when number of unique values is low	2013-04-13 12:42:53 -07:00
Igor Motov	e7b49d8936	Add more dynamic settings validation	2013-04-12 20:55:45 -04:00
Shay Banon	d385e1b356	Clear Cache API: Streamline option names closes #2890	2013-04-12 15:58:24 -07:00
Shay Banon	a2d72697eb	Expose field level field data statistics closes #2889	2013-04-12 15:51:08 -07:00
David Pilato	3b7a195f6f	Add toString() for FilterBuilders Closes #2887.	2013-04-12 22:27:51 +02:00
Martijn van Groningen	bf21466291	CacheTests test fix.	2013-04-12 19:14:38 +02:00
Martijn van Groningen	80dbca0809	Field data: Try to load short values as byte values and load int values as short or byte values to reduce the size they take in memory.	2013-04-12 19:11:18 +02:00
Shay Banon	5fbd4a12a0	fix memory computation for int field data	2013-04-12 08:38:52 -07:00
Martijn van Groningen	5c90e5f940	If no options are specified with the clear cache api then all caches should be cleared. Closes #2886	2013-04-12 15:24:50 +02:00
Igor Motov	00c035f88c	Make sure that settings are propagated to all nodes	2013-04-11 10:59:14 -04:00
Martijn van Groningen	2dfcc3c740	Test that size is actually computed. Relates to #2882	2013-04-11 10:22:48 +02:00
Simon Willnauer	9a2d27a035	rename prefix_length to prefix_len for consistency Closes #2883	2013-04-10 17:39:32 +02:00
Martijn van Groningen	4fd8c2c6d2	Ordinals were omitted from fielddata cache size calculation if field has more than one term. Closes #2882	2013-04-10 14:50:07 +02:00
Martijn van Groningen	637eeacb20	Better error description if field(s) (statistical facet) and value_field (term_stats facet) are not a numeric field	2013-04-10 11:11:52 +02:00
Martijn van Groningen	6a3c53ef44	Should prevent OOM	2013-04-10 10:00:51 +02:00
Martijn van Groningen	b8b28041e5	Fix for extended facets test.	2013-04-10 00:47:00 +02:00
Igor Motov	b0e44a2b40	Fix term counters in script field terms facet Fixes #2878	2013-04-09 12:42:35 -04:00
Simon Willnauer	ae74a8dbb7	Configure FieldData using a hash not a string Closes #2876	2013-04-09 15:53:05 +02:00
Simon Willnauer	374bbbfa7b	# FieldData Filter FieldData is an in-memory representation of the term dictionary in an uninverted form. Under certain circumstances this FieldData representation can grow very large on high-cardinality fields like tokenized full-text. Depending on the use-case filtering the terms that are hold in the FieldData representation can heavily improve execution performance and application stability. FieldData Filters can be applied on a per-segment basis. During FieldData loading the terms enumeration is passed through a filter predicate that either accepts or rejects a term. ## Frequency Filter The Frequency Filter acts as a high / low pass filter based on the document frequencies of a certain term within the segment that is loaded into field data. It allows to reject terms that are very high or low frequent based on absolute frequencies or percentages relative to the number of documents in the segment or more precise the number of document that have at least one value in the field that is loaded in the current segment. Here is an example mapping Here is an example mapping: ```json { "tweet" : { "properties" : { "locale" : { "type" : "string", "fielddata" : "format=paged_bytes;filter.frequency.min=0.001;filter.frequency.max=0.1", "index" : "analyzed", } } } } ``` ### Paramters * `filter.frequency.min` - the minimum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted. * `filter.frequency.max` - the maximum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted. * `filter.frequency.min_segment_size` - the minimum number of documents in a segment in order for the filter to be applied. Small segments might be omitted with this setting. ## Regular Expression Filter The regular expression filter applies a regular expression to each term during loading and only loads terms into memory that match the given regular expression. Here is an example mapping: ```json { "tweet" : { "properties" : { "locale" : { "type" : "string", "fielddata" : "format=paged_bytes;filter.regex=^en_.*", "index" : "analyzed", } } } } ``` Closes #2874	2013-04-09 11:34:48 +02:00
Igor Motov	acc0950957	Get template should return warmers Fixes #2868	2013-04-08 19:12:20 -04:00
Simon Willnauer	a10c80e20f	ensure that modificatons to the enum order trigger test failures since we rely on the ordinal	2013-04-08 23:29:56 +02:00
Simon Willnauer	7e77ddb88f	use enum to represent flags and fail if flags are not respected	2013-04-08 22:56:11 +02:00
Igor Motov	2a588dc1f1	Fix IndexMissingException in get template request Fixes #2873	2013-04-08 16:25:09 -04:00
Shay Banon	3120457bfe	move to 0.90.0.RC3 snap	2013-04-08 05:48:29 -07:00
Shay Banon	3a8cba4d50	release 0.90.0.RC2	2013-04-08 05:46:26 -07:00
Shay Banon	5fa66cd592	Node Stats: Allow to explicitly get specific indices level node stats element closes #2871	2013-04-07 20:22:48 -07:00
Shay Banon	15d7ae5983	FieldData Stats: Add field data stats to indices stats API closes #2870	2013-04-07 18:30:24 -07:00
Martijn van Groningen	86c1714bf3	Also test the `fields` option.	2013-04-07 21:52:19 +02:00
Simon Willnauer	7ad03ed789	Use IndexOption.DOCS_ONLY for boolean fields Closes #2866	2013-04-06 22:41:22 +02:00
Shay Banon	9f6c8c88f3	improve on shard level filter/id cache stats use just the removal listener and back to the IndexReader#coreCacheKey as the actual field as part of the cache key	2013-04-06 00:02:42 +02:00
Shay Banon	815917fbf8	confusing code..., but we can't release the searcher in a get result case we need that searcher later on..., need to think of how to simplify that..., added a comment for now	2013-04-05 23:27:03 +02:00
Simon Willnauer	36ffd6d582	release searcher in finally block rather than relying on an exception that is thrown	2013-04-05 22:45:52 +02:00
Shay Banon	84670212a6	Filter / Id Cache Stats: Add to Indices Stats API, revise node stats API closes #2862	2013-04-05 20:02:32 +02:00
Simon Willnauer	5e7ad9832c	Added more evil tests for different field data implementations	2013-04-05 18:12:50 +02:00
Martijn van Groningen	224faffead	Added an extended test for terms facet with a decent number of documents / field values and randomly tests various options. Also fixed an issue where `regex` and `excludes` were ignored when `all_terms` was used.	2013-04-05 17:38:46 +02:00
David Pilato	4b1ec037f8	Fix test for #2668 .	2013-04-05 15:00:28 +02:00
Martijn van Groningen	9b5c74d43e	Made sure `all_terms` works consistently. In some cases the `all_terms` option was ignored: * Faceting on number based fields. * The `execution_type` was set to `map`. * In the case the `fields` option was used. Closes #2861	2013-04-05 14:27:19 +02:00
Shay Banon	831ea789aa	rename getByOrd to getValueByOrd (to match BytesValues.WithOrdinals) also make it public so it can be used when iterating over ords	2013-04-05 13:56:33 +02:00
Shay Banon	bcc14cde9f	make numeric namings consistent with bytes ones also add the ability to get the ordinals from DoubleValues.WithOrdinals and LongValues.WithOrdinals	2013-04-05 13:33:56 +02:00
David Pilato	36b92be212	List of existing plugins with Node Info API We want to display information about loaded plugins in Node Info API using plugin option: ```sh curl http://localhost:9200/_nodes?plugin=true ``` For example, on a 4 nodes cluster, it could provide the following output: ```javascript { "ok" : true, "cluster_name" : "test-cluster-MacBook-Air-de-David.local", "nodes" : { "lodYfbFTRnmwE6rjWGGyQQ" : { "name" : "node1", "transport_address" : "inet[/172.18.58.139:9300]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9200]", "plugins" : [ ] }, "hJLXmY_NTrCytiIMbX4_1g" : { "name" : "node4", "transport_address" : "inet[/172.18.58.139:9303]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9203]", "plugins" : [ { "name" : "test-plugin", "description" : "test-plugin description", "site" : true, "jvm" : false }, { "name" : "test-no-version-plugin", "description" : "test-no-version-plugin description", "site" : true, "jvm" : false }, { "name" : "dummy", "description" : "No description found for dummy.", "url" : "/_plugin/dummy/", "site" : false, "jvm" : true } ] }, "bnoySsBfTrSzbDRZ0BFHvg" : { "name" : "node2", "transport_address" : "inet[/172.18.58.139:9301]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9201]", "plugins" : [ { "name" : "dummy", "description" : "This is a description for a dummy test site plugin.", "url" : "/_plugin/dummy/", "site" : false, "jvm" : true } ] }, "0Vwil01LSfK9YgRrMce3Ug" : { "name" : "node3", "transport_address" : "inet[/172.18.58.139:9302]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9202]", "plugins" : [ { "name" : "test-plugin", "description" : "test-plugin description", "site" : true, "jvm" : false } ] } } } ``` Information are cached for 10 seconds by default. Modify `plugins.info_refresh_interval` property if needed. Setting `plugins.info_refresh_interval` to `-1` will cause infinite caching. Setting `plugins.info_refresh_interval` to `0` will disable caching. Closes #2668.	2013-04-05 11:36:56 +02:00
Simon Willnauer	f3e6fe094a	beef up term facet tests	2013-04-05 11:05:24 +02:00
Simon Willnauer	9fbe075aec	Added test that compares concurrent facet execution results with a serial execution result	2013-04-05 10:36:53 +02:00
Shay Banon	5af6343697	allow to disable the optimization of removal of ords on single value numerics/geo field data field data settings in the mappings can have ordinals=always option	2013-04-05 00:44:07 +02:00
Shay Banon	54f685674b	Thread Pool: Update default settings (move from default cached to fixed) closes #2858	2013-04-04 23:24:49 +02:00
Simon Willnauer	f1dd867c4f	Catch Throwable when listener is called rather then Exception to prevent possible hangs if fatal exceptions or errors are thrown	2013-04-04 22:58:38 +02:00
Shay Banon	a206aa4548	Settings / Config: Allow to explicitly specify external environment variable syntax, in which case its optional fixes #2855	2013-04-04 16:30:24 +02:00
Simon Willnauer	d758401add	Cleanup ScriptDocValues. This commit adds a getValues method to all ScriptDocValues for easy access in scripts via doc['field'].values / value.	2013-04-04 16:07:54 +02:00
Alexander Reelsen	4f96b36376	Returning configuration of root field mappers toXContent method only if they are enabled	2013-04-04 15:55:12 +02:00
Alexander Reelsen	fbdf89c636	Fix for ttl fieldmapper to support disabling correctly. Also returning only booleans, not enums in toXContent	2013-04-04 12:27:23 +02:00
Alexander Reelsen	955788e9a5	Allowing to disable size field mapper after enabling	2013-04-04 09:41:41 +02:00
Alexander Reelsen	e662e4d55d	Allowing to disable index field mapper after enabling	2013-04-04 09:41:41 +02:00
Alexander Reelsen	9cc2563d5e	Allowing to disable timestamp field mapper after enabling	2013-04-04 09:41:41 +02:00
Simon Willnauer	223ec2c42d	Beef up FieldData tests by running one on one duells	2013-04-03 18:38:25 +02:00
Igor Motov	356329df00	Improve stability of ClusterHealthTests	2013-04-03 12:07:42 -04:00
Igor Motov	d2f6349dcf	Improve stability of MinimumMasterNodesTests	2013-04-03 11:51:28 -04:00
Martijn van Groningen	0a89c80554	Fixed issue where a doc is omitted from the hits if it has no geo point and sorting is based on geo distance. Closes #2851	2013-04-03 17:25:16 +02:00
Simon Willnauer	bbe619a416	Call onFailure for every exception case even in the case of an error / runtime exception Closes #2848	2013-04-03 12:25:58 +02:00
Simon Willnauer	eb8b38d027	Upgrade to Lucene 4.2.1	2013-04-03 12:22:39 +02:00
Martijn van Groningen	af2f31c33e	Fixed typo	2013-04-02 22:31:06 +02:00
Martijn van Groningen	cf00acf5b0	If no specified index or alias exists and `ignore_indices` is set to `missing` an index missing error is returned instead of resolving to all open indices (e.g. when searching). This breaks backwards comp. with 0.20.x and before. Closes #2837	2013-04-02 19:06:17 +02:00
Alexander Reelsen	0a466352cd	Add support for creating a fedora RPM package with maven Note: This has been disabled by default and is therefore not included in a standard build. The main reason for this is, that you need to have a RPM binary and the rpm development packages installed, which is not the case on many systems. The package contains an init.d-script as well as systemd configurations. You can build your own RPM package simply by running 'maven rpm:rpm'	2013-04-02 16:19:45 +02:00
Shay Banon	10a76ad5d8	fix seen readers counter since clear can be called on percolator as well, we need to make sure we inc the counter even for non segment readers	2013-04-02 13:25:56 +02:00
Shay Banon	31d1e6cfe7	Field Data: Simplify field data cache settings closes #2843	2013-04-02 12:44:39 +02:00
Alexander Reelsen	d866321c55	Merge pull request #2811 from spinscale/document-mapper-merge Allow to update ttl field mapping after initial creation. Fixes #2136	2013-04-01 23:37:29 -07:00
Simon Willnauer	7efa92636a	Cut over to IntsRef in favor of IntsArrayRef	2013-03-31 10:46:21 +02:00
Simon Willnauer	b3356d9f8d	remove dead code	2013-03-31 10:19:17 +02:00
Simon Willnauer	2a09342405	remove Bytes.java in favor of BytesRef / ArrayUtils	2013-03-31 08:54:39 +02:00
Simon Willnauer	e864d5785e	optimize matcher reset to not create unnecessary string objects	2013-03-30 17:35:23 +01:00
Simon Willnauer	fefa8da2ea	remove StringValues in favor of BytesValues	2013-03-30 17:35:23 +01:00
Simon Willnauer	dff2a9279c	clean-up double values	2013-03-30 17:35:23 +01:00
Simon Willnauer	d5c271acf5	clean-up long values	2013-03-30 17:35:22 +01:00
Simon Willnauer	5aedf74fb0	Remove getValues from numeric and string field data & clean up geo field data	2013-03-30 17:35:22 +01:00
Simon Willnauer	7f81469137	Refactor BytesValues to be reused as the interface for HashedBytesValues and remove HashBytesValues.	2013-03-30 17:35:22 +01:00
Simon Willnauer	129f02623b	Added FST based FieldData implementation holding all data in a per segment FST. This commit factors our a common API for BytesValues based impl to shared code and reduce code duplication.	2013-03-30 17:35:22 +01:00
Shay Banon	72c76c2799	fail on malformed sort	2013-03-30 13:58:39 +01:00
Shay Banon	6a1cb8f61b	{sort: "field"} throws misleading errors fixes #2835	2013-03-30 13:46:53 +01:00
Martijn van Groningen	2e93329e23	If match then go to next doc	2013-03-29 16:57:42 +01:00
Martijn van Groningen	a89dde8bac	Fixed `bool` filter bugs: * In the case only should clauses were specified with specific type of filters, the first clause determined which documents matched. * In some cases the minimum at least 1 should clause should match behaviour was broken.	2013-03-29 16:48:36 +01:00
Igor Motov	b657bdfa1a	Optimize aliases processing Closes# 2832	2013-03-29 10:44:45 -04:00
Alexander Reelsen	a880a6c85e	Allow to update ttl field mapping after initial creation. Fixes #2136 Adding possibility to change TTL field mapper data without specifying enabled flag in mapping update	2013-03-28 17:25:28 +01:00
Martijn van Groningen	941aa17a43	Added sort mode to geo distance sorting. Closes #1846	2013-03-28 17:04:42 +01:00
Igor Motov	9bc50ea609	Fix LeastUsedDistributor and ensure random distribution for multiple non-fs directories If we cannot determine available space the fallback scenario is now to use random distribution instead of always using the last directory. Fixes #2820	2013-03-28 11:08:54 -04:00
Shay Banon	1fc37e5954	Segments API: Add version & compound for each segment closes #2823	2013-03-28 15:34:38 +01:00
Shay Banon	473473e867	remove the field settings for query parser cache, not really relevant	2013-03-27 20:39:36 +01:00
Shay Banon	c18c609af1	Date math in query_string caches now() fixes #2808	2013-03-27 20:32:38 +01:00
Igor Motov	5bb75f9da3	Move applying alias filter to ContextSearch#preProcess()	2013-03-27 09:23:54 -04:00
Simon Willnauer	17f83f33bb	Terminate early when no terms left in the suggest string. Closes #2817	2013-03-26 17:44:34 +01:00
Igor Motov	9ae421a8b2	Fix filtering aliases with non-empty sort options Fixes #2816	2013-03-26 07:23:44 -04:00
Shay Banon	d35a3b03c8	Warmers: Have an explicit warmer thread pool add 1 in case there is 1 core... closes #2815	2013-03-25 23:34:52 +01:00
Simon Willnauer	aa97c031f2	Don't reset tokenstream before passing to the MemoryIndex, otherwise some tokenizer might swallow tokens. Closes #2814	2013-03-25 22:46:11 +01:00
Shay Banon	b7106622d8	Warmers: Have an explicit warmer thread pool Have an explicit threadpool warmer that is dedicated to execute warmers. Currently, it uses the search threadpool, which does not work well since the number of concurrent searches should be separate from the number of concurrent warmers allows, also the characteristics of the search pool (for example, bounded queue_size) might not fit well with how warmers should be executed (they should not be "rejected"). closes #2815	2013-03-25 16:46:37 +01:00
Shay Banon	0e815ce11c	add 0.20.7	2013-03-25 12:33:55 +01:00
adavis	6a93fbcf07	Adding parsing for zero terms query for multi match Tests for multi-match zero_terms_query and making references to the ZeroTermsQuery enum consistent to others used in MultiMatchQueryBuilder	2013-03-23 08:59:39 +01:00
adavis	3f83904680	Fixes java6_u31 compile error w.r.t. type inference	2013-03-22 16:46:42 -07:00
Simon Willnauer	560d2c094e	Fix issue where entire shards are lost due to too many open files exceptions and a but in Lucene's IndexWriter / DirectoryReader where an existsing index was not detected and then deleted due to a wrong creation mode. See LUCENE-4870 Closes #2812	2013-03-22 17:18:55 +01:00
Florian Schilling	1a67793a4b	Added Script test for geo distance tests and modified GeoUtils.normalizePoint()	2013-03-22 13:34:18 +01:00
Simon Willnauer	075779a397	Call onMissing if doc has no value in the field. Closes #2807	2013-03-21 22:45:17 +01:00
Simon Willnauer	064d272916	Respect offset and length when iterating over BytesRef in Uid. The length is starting at offset Closes #2806	2013-03-21 19:29:05 +01:00
Simon Willnauer	5f05c2106f	Use more efficient StemmerOverrideFilter from Lucene trunk Closes #2800	2013-03-21 07:58:51 +01:00
Shay Banon	ea698add72	move to 0.90.0.RC2 snap	2013-03-20 19:06:30 +01:00
Shay Banon	a2f14b68e8	release 0.90.0.RC1	2013-03-20 19:05:08 +01:00
Florian Schilling	f08d458545	# GeoShape Precision The `geo_shape` precision could be only set via `tree_levels` so far. A new option `precision` now allows to set the levels of the underlying tree structures to be set by distances like `50m`. The def ## Example ```json curl -XPUT 'http://127.0.0.1:9200/myindex/' -d '{ "mappings" : { "type1": { "dynamic": "false", "properties": { "location" : { "type" : "geo_shape", "geohash" : "true", "store" : "yes", "precision":"50m" } } } } }' ``` ## Changes - GeoUtils defines the [WGS84](http://en.wikipedia.org/wiki/WGS84) reference ellipsoid of earth - DistanceUnits refer to a more precise definition of earth circumference - DistanceUnits for inch, yard and meter have been defined - Set default levels in GeoShapeFieldMapper to 50m precision Closes #2803	2013-03-20 14:52:47 +01:00
Simon Willnauer	4705eb2959	Lazily initialize the delegate in BloomFilteredPostingsFormat to prevent unnecessary loading if bloomfilter terminates early	2013-03-20 12:43:17 +01:00
Simon Willnauer	747ce36915	Specialise the default codec to reuse Lucene41 files in the common case. Closes #2799	2013-03-20 12:43:17 +01:00
Shay Banon	54e7e309a5	better comment...	2013-03-19 14:36:13 +01:00
Shay Banon	d5beea4bba	if multicast socket closes, try and restart it also, throttle on socket failures, so it won't spin out of control... relates to #2783	2013-03-19 11:20:47 +01:00
Shay Banon	f4a212420b	multicastSocket should be volatile as well...	2013-03-19 10:23:39 +01:00
Shay Banon	c92207f483	broadcast API to by default ignore missing / illegal shard state this happens for example because we list assigned shards, and they might not have been allocated on the relevant node yet, no need to list those as actual failures in some APIs	2013-03-19 10:22:43 +01:00
Shay Banon	aca713d68e	tar.gz distro by mistake include a windows lib	2013-03-18 22:46:04 +01:00
Shay Banon	566d1d13f7	fix javadoc	2013-03-18 22:04:31 +01:00
Clinton Gormley	2123ab591c	Correct filter strategy opt: random_access_random to random_access_always	2013-03-18 20:17:26 +01:00
Shay Banon	7d9cef904b	Field Data: optimize long type to use narrowest possible type automatically closes #2795	2013-03-18 12:37:15 +01:00
Shay Banon	82072fc47f	make ES compile with java 8 - that isAnnotationPresent bug is known, and probably will be fixed in later versions, but it costs us nothing to not use it now - some tests fail, mainly due to consistent ordering expected from Map (within versions) which does not seem to be preserved, need to fix those tests to be agnostic to it	2013-03-18 01:33:09 +01:00
Shay Banon	e347a626da	use ImmutableList.Builder instead of ArrayList	2013-03-17 21:55:07 +01:00
Shay Banon	2ed6ea25cc	fix logging message to include the index also add the list of current indices	2013-03-16 22:58:45 +01:00
Shay Banon	111a13222e	Mapping: dynamic flag is explicitly returned even when not set fixes #2789	2013-03-16 01:29:22 +01:00
Simon Willnauer	c25eb7defe	Fix bug in RateLimiter.SimpleRateLimiter causing numeric overflow in StoreStats Closes #2785	2013-03-15 23:36:31 +01:00
Shay Banon	d5da8f22ff	improve TODO comment	2013-03-15 21:46:02 +01:00
Simon Willnauer	0e3b88be35	add CamelCase support to Suggester where missing	2013-03-15 15:07:15 +01:00
Simon Willnauer	e0eff7d9d3	Remove `sort_order` and `sort_mode` in favor of `order` and `mode` Closes #2781	2013-03-15 13:57:39 +01:00
Simon Willnauer	33608c333f	Add `sort_oder` and `sortOrder` as valid field names for defining the sort order in a Sort object. Closes #2767	2013-03-15 08:42:19 +01:00
Simon Willnauer	5f20d81199	Make StupidBackoff the default smoothing model for phrase suggester Closes #2780	2013-03-14 23:03:15 +01:00
Shay Banon	91c51ef05c	minor cleanup suggest api - make sure we close the parser - fail when no content is provided in the rest request - reuse the suggest parse element	2013-03-13 12:18:14 -07:00
Florian Schilling	25bd9cecd0	# REST Suggester API The REST Suggester API binds the 'Suggest API' to the REST Layer directly. Hence there is no need to touch the query layer for requesting suggestions. This API extracts the Phrase Suggester API and makes 'suggestion request' top-level objects in suggestion requests. The complete API can be found in the underlying ["Suggest Feature API"](http://www.elasticsearch.org/guide/reference/api/search/suggest.html). # API Example The following examples show how Suggest Actions work on the REST layer. According to this a simple request and its response will be shown. ## Suggestion Request ```json curl -XPOST 'localhost:9200/_suggest?pretty=true' -d '{ "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "bigram", "field" : "bigram", "size" : 1, "real_word_error_likelihood" : 0.95, "max_errors" : 0.5, "gram_size" : 2 } } }' ``` This example shows how to query a suggestion for the global text 'Xor the Got-Jewel'. A 'simple phrase' suggestion is requested and a 'direct generator' is configured to generate the candidates. ## Suggestion Response On success the request above will reply with a response like the following: ```json { "simple_phrase" : [ { "text" : "Xor the Got-Jewel", "offset" : 0, "length" : 17, "options" : [ { "text" : "xorr the the got got jewel", "score" : 3.5283546E-4 } ] } ] } ``` The 'suggest'-response contains a single 'simple phrase' which contains an 'option' in turn. This option represents a suggestion of the queried text. It contains the corrected text and a score indicating the probability of this option to be meant. Closes #2774	2013-03-13 19:36:29 +01:00
Jörg Prante	a127f2d2e8	avoiding NPE in Sigar FS	2013-03-13 10:05:59 -07:00
Alexander Reelsen	125b33d3dc	GeoJSONShapeParser parses JSON correctly and extracts coordinates even if 'crs' field is included. Fixes #2763	2013-03-13 15:17:21 +01:00
Simon Willnauer	365cde82d3	Use numOrds rather than numDocs as upperbound for sorting Closes #2773	2013-03-13 15:13:56 +01:00
Clinton Gormley	93ca6e2c4b	tieBreaker in MultiMatchQueryBuilder should be a float, not an integer Closes #2772	2013-03-13 13:44:59 +01:00
Shay Banon	5ed9fb2c54	support also mode in search sorting, and fail on illegal parameters	2013-03-12 15:16:25 -07:00
Shay Banon	55ceb01c44	force close connection if its on a connect failure relates to Repeated ConnectExceptions in logs until node is restarted, fixes #2766	2013-03-12 14:49:07 -07:00
Shay Banon	877105ee19	no need for specific time / empty based classes, just as final fields	2013-03-12 12:32:05 -07:00
Igor Motov	3a534c64e5	Add dynamic settings validation Fixes #2749	2013-03-12 14:41:00 -04:00
Simon Willnauer	c008c59927	add missing license header	2013-03-12 14:57:47 +01:00
Simon Willnauer	c5395436e6	fix test bug where a small time window exists that can trigger a false failure due to default concurrent recoveries	2013-03-12 14:48:14 +01:00
Simon Willnauer	237c4ddf54	Introdue ParentIdCollector that collects only if the parent ID is non null ie. if the document has a parent. Closes #2744	2013-03-11 21:11:05 +01:00
Clinton Gormley	7961dfa7ab	Fixed a typo in an error message "should exists" -> "should exist"	2013-03-11 15:37:20 +01:00
Simon Willnauer	9442c41481	enable testcase that relied on a Lucene 4.2 fix	2013-03-11 12:57:24 +01:00
Simon Willnauer	4e7cff488e	add test that ensures that we bumb the version on a Lucene Upgrade	2013-03-11 10:30:48 +01:00
Simon Willnauer	ebadd9ebbd	Fix tests since Lucene 4.2 we can support date math in Fuzzy-Search Syntax	2013-03-11 08:23:01 +01:00
Simon Willnauer	a37f1f55cc	Add tests for highlighting boost query. Closes #1314	2013-03-11 08:23:01 +01:00
Simon Willnauer	11bf7a8b1a	Upgrade to Lucene 4.2	2013-03-11 08:23:01 +01:00
Simon Willnauer	75fd6d4985	Added KeywordRepeatFilter that allows emitting stemmed and unstemmed versions of the same token if stemmers are used Closes #2753	2013-03-09 23:09:59 +01:00
Simon Willnauer	dc9a052287	Respect CandidateGenerator#size if set in the request and reduce the total #of candidates to the shard size. Closes #2752	2013-03-09 13:36:40 +01:00
Shay Banon	cc6c07365c	has_child query AVG score mode does not always work correctly fixes #2750	2013-03-08 08:50:11 -08:00
Shay Banon	eb956e7c09	Term/Terms filters on numeric fields gives wrong result fixes #2746	2013-03-07 22:12:22 -08:00
Shay Banon	c298c19177	don't use cache for ordinals for small max ord	2013-03-07 08:45:01 -08:00
Simon Willnauer	2c8d8ef8e0	check for null on setters taht must not be null in IndicesReplicationOperationRequest	2013-03-07 10:18:10 +01:00
Benjamin Devèze	35f5ca915d	Add support for ignore_indices to delete by query Closes #2734	2013-03-07 10:17:51 +01:00
Simon Willnauer	12a2808168	exhaust object to allow subsequent objects to be parsed correctly	2013-03-06 15:34:59 +01:00
Simon Willnauer	1f217f6a7b	Move smoothing model into its own sub-object in the PhraseSuggest request Closes #2735	2013-03-06 14:31:21 +01:00
Shay Banon	e1409a9f0e	Problems with range searches for time with lte fixes #2731	2013-03-05 18:10:30 -08:00
Shay Banon	9a25867bfe	Network: A closed channel might not always fire up a close event fixes #2733	2013-03-05 11:49:10 -08:00
Igor Motov	acff102234	Implement search shards API Closes #2726	2013-03-05 09:17:59 -05:00
Simon Willnauer	1eb24d7efc	use a base ShingleFilterFactory to simplify default shingle detection	2013-03-05 12:32:50 +01:00
Simon Willnauer	0f95499703	if word scorer is on unigram make sure we score the current position not position 0	2013-03-05 12:31:32 +01:00
Simon Willnauer	876b5a3dcd	prefer totalTermFrequency over docFreq in PhraseSuggester	2013-03-05 10:46:25 +01:00
Simon Willnauer	315744be55	Set shardSize according to the total size if not explicitly specified. Closes #2729	2013-03-05 09:22:23 +01:00
Shay Banon	3e264f6b95	cleanup deletion of content in shards we are very conservative on when we delete data, remove the actual options of deleting data that are not used	2013-03-04 20:41:19 -08:00
Shay Banon	1ed07c1794	add a list of files that exists in the index to the failure	2013-03-04 18:15:06 -08:00
Shay Banon	d609571897	add close method to field data	2013-03-04 16:42:29 -08:00
Shay Banon	cfd8bddde4	Remove JMX connector creation flags, and JMX attributes closes #2728	2013-03-04 16:12:18 -08:00
Shay Banon	774622abfb	Change field data stats header from `field_data` to `fielddata`. fixes #2727	2013-03-04 23:50:33 +01:00
Shay Banon	d2dc672f43	allow to specify a list of settings to get a value for	2013-03-04 23:41:43 +01:00
Drew Raines	a8d52b58b6	Remove obsolete test.	2013-03-04 15:22:40 -06:00
Andrii Gakhov	dc28151ad7	fixed interchanged values in field_data stats fixes #2724	2013-03-04 11:19:33 +01:00
Shay Banon	a1b2434339	revert change on listing plugins on /_plugin we should provide it as part of nodes info relates to #2664	2013-03-03 21:52:44 +01:00
Shay Banon	a7da27c714	Field Data: Add `node` level cache type closes #2722	2013-03-03 19:55:06 +01:00
Shay Banon	e01879a698	add evictions stats to field data	2013-03-03 18:41:17 +01:00
Simon Willnauer	e9ba98913b	simplify searchShard selection when routing is present	2013-03-03 14:32:19 +01:00
Benjamin Devèze	09f20e3d4c	Fix bug when searching concrete and routing aliased indices Closes #2683	2013-03-03 14:31:57 +01:00
uboness	881cb7900c	Change geo_shapes support: * Exposed the spatial strategy to be configurable as part of the geo_shape mappings * Exposed the spatial strategy to be customizable at query time (will be used to generate the geo_shape filter/query) * Removed XTermQueryPrefixTreeStrategy and reverted to use the lucene TermQueryPrefixTreeStrategy instead * Made the RecursivePrefixTreeStrategy the default strategy to be used * Removed support for all spatial operations except "intersects" * Updated both the GeoShapeQueryBuilder and GeoShapeFilterBuilder with all the changes (removed the option of specifying the operation type (as only intersects is supported) and added the option of setting the filter/query spatial strategy Closes #2720	2013-03-02 17:13:58 +01:00
Simon Willnauer	b9513511e0	Check for null query on Percolator query loading and omit the query if it can't be parsed. Closes #2547	2013-03-02 16:55:39 +01:00
Shay Banon	0be5a7888f	fix local flag in cluster health	2013-03-02 16:00:10 +01:00
Shay Banon	5dd18acd0e	proper reason for cluster state task	2013-03-02 15:48:01 +01:00
Shay Banon	50d121315b	add ability for cluster health to wait for current events to be processed help with tests that run on slow machines	2013-03-02 14:25:45 +01:00
tristanbuckner	9273d76cdf	Make BoolFilterBuilder output proper json	2013-03-02 01:07:50 +01:00
Shay Banon	ea097afd91	add proper testing for bool filter	2013-03-02 01:07:05 +01:00
Shay Banon	361d6bf89a	spin a bit to wait for condition in test, so slow machines will still run it correctly	2013-03-01 23:36:13 +01:00
Shay Banon	fe8b3725bb	lazy set the indices on the search request now that its validated	2013-03-01 22:45:59 +01:00
Shay Banon	6687ecb038	Query DSL: Filtered query to make query optional (defaults to mach_all) closes #2718	2013-03-01 22:40:22 +01:00
Matt Weber	dfd92265b7	Correct order of routing and parent params for Get The order in which routing and parent parameters are set is important. The routing parameter must be set first or it will overwrite the parent routing value.	2013-03-01 22:24:14 +01:00
Shay Banon	2eea99255d	Analyze API returns in YAML format if analyzed string begins with --- fixes #2624	2013-03-01 22:17:09 +01:00
Shay Banon	9b68e98ea2	more strict check before trying to parse and detect a string as a date fixes #2694	2013-03-01 22:15:32 +01:00
Jeremy Jongsma	d16efbe47f	Throw correct ClassNotFoundException to debug classloader issues	2013-03-01 21:56:59 +01:00
Simon Willnauer	aaa3c48b3c	Throw IAE if indices is null or contains a null value. Closes #2656	2013-03-01 21:26:23 +01:00
Simon Willnauer	fced68c22d	ensure that suggestion only added on reduce if they are present in the shard response	2013-03-01 21:09:10 +01:00
Martijn van Groningen	d99b532f0f	Supporting sort modes `avg` and `sum` when sorting inside nested objects. Previously this commit either sort modes `min` or `max` (depending on sort order) was used when sort modes `avg` and `sum` were picked. Closes #2701	2013-03-01 19:53:20 +01:00
Simon Willnauer	39f362326e	Short Curcuit response if no indices exits and make sure listener is notified. Closes #2692	2013-03-01 15:15:56 +01:00
Simon Willnauer	3c1f291801	Fail in metadata parsing if the id path is not a value but rather an array or an object. Closes #2275	2013-03-01 13:00:29 +01:00
Simon Willnauer	b03f3fcd6c	throw IAE if fieldname is null - Closes #2711	2013-03-01 12:10:07 +01:00
Simon Willnauer	9c3898900d	always use the max score across the shards in suggest response	2013-03-01 12:09:29 +01:00
Shay Banon	30075bb6f9	add info in test for actual search failures	2013-03-01 00:00:09 +01:00
Shay Banon	849a3677cd	improve timing in test to wait for state with graceful timeouts (yet, validate early and exit when relevant)	2013-02-28 23:44:52 +01:00
Simon Willnauer	c90c5cbf85	fix bug in StupidBackoffScorer were previous word and current word were flipped creating non-existing bigram	2013-02-28 21:23:41 +01:00
Simon Willnauer	b4b3e350a6	Expose _explain via POST Closes #2710	2013-02-28 18:19:08 +01:00
Simon Willnauer	d4ec03ed76	# Phrase Suggester The `term` suggester provides a very convenient API to access word alternatives on token basis within a certain string distance. The API allows accessing each token in the stream individually while suggest-selection is left to the API consumer. Yet, often already ranked / selected suggestions are required in order to present to the end-user. Inside ElasticSearch we have the ability to access way more statistics and information quickly to make better decision which token alternative to pick or if to pick an alternative at all. This `phrase` suggester adds some logic on top of the `term` suggester to select entire corrected phrases instead of individual tokens weighted based on a ngram-langugage models. In practice it will be able to make better decision about which tokens to pick based on co-occurence and frequencies. The current implementation is kept quite general and leaves room for future improvements. # API Example The `phrase` request is defined along side the query part in the json request: ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 1, "real_word_error_likelihood" : 0.95, "max_errors" : 0.5, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 } ] } } } } ``` The response contains suggested sored by the most likely spell correction first. In this case we got the expected correction `xorr the god jewel` first while the second correction is less conservative where only one of the errors is corrected. Note, the request is executed with `max_errors` set to `0.5` so 50% of the terms can contain misspellings (See parameter descriptions below). ```json { "took" : 37, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2938, "max_score" : 0.0, "hits" : [ ] }, "suggest" : { "simple_phrase" : [ { "text" : "Xor the Got-Jewel", "offset" : 0, "length" : 17, "options" : [ { "text" : "xorr the god jewel", "score" : 0.17877324 }, { "text" : "xor the god jewel", "score" : 0.14231323 } ] } ] } } ```` # Phrase suggest API ## Basic parameters * `field` - the name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections. * `gram_size` - sets max size of the n-grams (shingles) in the `field`. If the field doesn't contain n-grams (shingles) this should be omitted or set to `1`. * `real_word_error_likelihood` - the likelihood of a term being a misspelled even if the term exists in the dictionary. The default it `0.95` corresponding to 5% or the real words are misspelled. * `confidence` - The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggest candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of `1.0` will only return suggestions that score higher than the input phrase. If set to `0.0` the top N candidates are returned. The default is `1.0`. * `max_errors` - the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range `[0..1)` as a fraction of the actual query terms a number `>=1` as an absolut number of query terms. The default is set to `1.0` which corresponds to that only corrections with at most 1 misspelled term are returned. * `separator` - the separator that is used to separate terms in the bigram field. If not set the whitespce character is used as a separator. * `size` - the number of candidates that are generated for each individual query term Low numbers like `3` or `5` typically produce good results. Raising this can bring up terms with higher edit distances. The default is `5`. * `analyzer` - Sets the analyzer to analyse to suggest text with. Defaults to the search analyzer of the suggest field passed via `field`. * `shard_size` - Sets the maximum number of suggested term to be retrieved from each individual shard. During the reduce phase the only the top N suggestions are returned based on the `size` option. Defaults to `5`. * `text` - Sets the text / query to provide suggestions for. ## Smoothing Models The `phrase` suggester supports multiple smoothing models to balance weight between infrequent grams (grams (shingles) are not existing in the index) and frequent grams (appear at least once in the index). * `laplace` - the default model that uses an additive smoothing model where a constant (typically `1.0` or smaller) is added to all counts to balance weights, The default `alpha` is `0.5`. * `stupid_backoff` - a simple backoff model that backs off to lower order n-gram models if the higher order count is `0` and discounts the lower order n-gram model by a constant factor. The default `discount` is `0.4`. * `linear_interpolation` - a smoothing model that takes the weighted mean of the unigrams, bigrams and trigrams based on user supplied weights (lambdas). Linear Interpolation doesn't have any default values. All parameters (`trigram_lambda`, `bigram_lambda`, `unigram_lambda`) must be supplied. ## Candidate Generators The `phrase` suggester uses candidate generators to produce a list of possible terms per term in the given text. A single candidate generator is similar to a `term` suggester called for each individual term in the text. The output of the generators is subsequently scored in in combination with the candidates from the other terms to for suggestion candidates. Currently only one type of candidate generator is supported, the `direct_generator`. The Phrase suggest API accepts a list of generators under the key `direct_generator` each of the generators in the list are called per term in the original text. ## Direct Generators The direct generators support the following parameters: * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: * `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. * `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. * `always` - Suggest any matching suggestions based on terms in the suggest text. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option. * pre_filter - a filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. (optional) * post_filter - a filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. (optional) The following example shows a `phrase` suggest call with two generators, the first one is using a field containing ordinary indexed terms and the second one uses a field that uses terms indexed with a `reverse` filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The `pre_filter` and `post_filter` options accept ordinary analyzer names. ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 4, "real_word_error_likelihood" : 0.95, "confidence" : 2.0, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 }, { "field" : "reverse", "suggest_mode" : "always", "min_word_len" : 1, "pre_filter" : "reverse", "post_filter" : "reverse" } ] } } } } ``` `pre_filter` and `post_filter` can also be used to inject synonyms after candidates are generated. For instance for the query `captain usq` we might generate a candidate `usa` for term `usq` which is a synonym for `america` which allows to present `captain america` to the user if this phrase scores high enough. Closes #2709	2013-02-28 16:17:59 +01:00
Shay Banon	2bc624806d	not bytes...	2013-02-28 16:02:38 +01:00
Shay Banon	7400c30eba	fail a shard if a merge failure occurs	2013-02-27 23:44:55 +01:00
Shay Banon	e908c723f1	don't log merge failures twice	2013-02-27 20:23:40 +01:00
Simon Willnauer	7be8f431d5	move id tests into SimpleQueryTests	2013-02-27 19:03:42 +01:00
Simon Willnauer	8ab602ec81	Fix AIOOB exception in UID type/id tuple creation. Closes #2695	2013-02-27 18:58:27 +01:00
Shay Banon	3b2d403292	malformed elasticsearch.yml causes unresponsive hang fixes #2693	2013-02-27 18:58:08 +01:00
Drew Raines	cb7a569f4b	Include preference in _count serialization and builder. [#2698 ]	2013-02-27 08:15:02 -06:00
Martijn van Groningen	ffbdc0a4c3	Updated postings format jdocs	2013-02-27 10:46:55 +01:00
Drew Raines	b53a8aff6a	Allow _count to take preference parameter. [#2698 ]	2013-02-26 16:24:52 -06:00
Shay Banon	1e937fd5d1	Allow index: "no" for _type fixes #2696	2013-02-26 22:06:52 +01:00
Martijn van Groningen	7c53d22ce9	Moved resolveClosestNestedObjectMapper to MapperService	2013-02-26 17:48:02 +01:00
Igor Motov	de243493c9	Changing dynamic index and cluster settings should work on master-only nodes Fixes #2675	2013-02-26 08:54:46 -05:00

... 3 4 5 6 7 ...

1782 Commits