OpenSearch

Commit Graph

Author	SHA1	Message	Date
tristanbuckner	9273d76cdf	Make BoolFilterBuilder output proper json	2013-03-02 01:07:50 +01:00
Shay Banon	ea097afd91	add proper testing for bool filter	2013-03-02 01:07:05 +01:00
Shay Banon	361d6bf89a	spin a bit to wait for condition in test, so slow machines will still run it correctly	2013-03-01 23:36:13 +01:00
Shay Banon	fe8b3725bb	lazy set the indices on the search request now that its validated	2013-03-01 22:45:59 +01:00
Shay Banon	6687ecb038	Query DSL: Filtered query to make query optional (defaults to mach_all) closes #2718	2013-03-01 22:40:22 +01:00
Matt Weber	dfd92265b7	Correct order of routing and parent params for Get The order in which routing and parent parameters are set is important. The routing parameter must be set first or it will overwrite the parent routing value.	2013-03-01 22:24:14 +01:00
Shay Banon	2eea99255d	Analyze API returns in YAML format if analyzed string begins with --- fixes #2624	2013-03-01 22:17:09 +01:00
Shay Banon	9b68e98ea2	more strict check before trying to parse and detect a string as a date fixes #2694	2013-03-01 22:15:32 +01:00
Jeremy Jongsma	d16efbe47f	Throw correct ClassNotFoundException to debug classloader issues	2013-03-01 21:56:59 +01:00
Simon Willnauer	aaa3c48b3c	Throw IAE if indices is null or contains a null value. Closes #2656	2013-03-01 21:26:23 +01:00
Simon Willnauer	fced68c22d	ensure that suggestion only added on reduce if they are present in the shard response	2013-03-01 21:09:10 +01:00
Martijn van Groningen	d99b532f0f	Supporting sort modes `avg` and `sum` when sorting inside nested objects. Previously this commit either sort modes `min` or `max` (depending on sort order) was used when sort modes `avg` and `sum` were picked. Closes #2701	2013-03-01 19:53:20 +01:00
Simon Willnauer	39f362326e	Short Curcuit response if no indices exits and make sure listener is notified. Closes #2692	2013-03-01 15:15:56 +01:00
Simon Willnauer	3c1f291801	Fail in metadata parsing if the id path is not a value but rather an array or an object. Closes #2275	2013-03-01 13:00:29 +01:00
Simon Willnauer	b03f3fcd6c	throw IAE if fieldname is null - Closes #2711	2013-03-01 12:10:07 +01:00
Simon Willnauer	9c3898900d	always use the max score across the shards in suggest response	2013-03-01 12:09:29 +01:00
Shay Banon	30075bb6f9	add info in test for actual search failures	2013-03-01 00:00:09 +01:00
Shay Banon	849a3677cd	improve timing in test to wait for state with graceful timeouts (yet, validate early and exit when relevant)	2013-02-28 23:44:52 +01:00
Simon Willnauer	c90c5cbf85	fix bug in StupidBackoffScorer were previous word and current word were flipped creating non-existing bigram	2013-02-28 21:23:41 +01:00
Simon Willnauer	b4b3e350a6	Expose _explain via POST Closes #2710	2013-02-28 18:19:08 +01:00
Simon Willnauer	d4ec03ed76	# Phrase Suggester The `term` suggester provides a very convenient API to access word alternatives on token basis within a certain string distance. The API allows accessing each token in the stream individually while suggest-selection is left to the API consumer. Yet, often already ranked / selected suggestions are required in order to present to the end-user. Inside ElasticSearch we have the ability to access way more statistics and information quickly to make better decision which token alternative to pick or if to pick an alternative at all. This `phrase` suggester adds some logic on top of the `term` suggester to select entire corrected phrases instead of individual tokens weighted based on a ngram-langugage models. In practice it will be able to make better decision about which tokens to pick based on co-occurence and frequencies. The current implementation is kept quite general and leaves room for future improvements. # API Example The `phrase` request is defined along side the query part in the json request: ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 1, "real_word_error_likelihood" : 0.95, "max_errors" : 0.5, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 } ] } } } } ``` The response contains suggested sored by the most likely spell correction first. In this case we got the expected correction `xorr the god jewel` first while the second correction is less conservative where only one of the errors is corrected. Note, the request is executed with `max_errors` set to `0.5` so 50% of the terms can contain misspellings (See parameter descriptions below). ```json { "took" : 37, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2938, "max_score" : 0.0, "hits" : [ ] }, "suggest" : { "simple_phrase" : [ { "text" : "Xor the Got-Jewel", "offset" : 0, "length" : 17, "options" : [ { "text" : "xorr the god jewel", "score" : 0.17877324 }, { "text" : "xor the god jewel", "score" : 0.14231323 } ] } ] } } ```` # Phrase suggest API ## Basic parameters * `field` - the name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections. * `gram_size` - sets max size of the n-grams (shingles) in the `field`. If the field doesn't contain n-grams (shingles) this should be omitted or set to `1`. * `real_word_error_likelihood` - the likelihood of a term being a misspelled even if the term exists in the dictionary. The default it `0.95` corresponding to 5% or the real words are misspelled. * `confidence` - The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggest candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of `1.0` will only return suggestions that score higher than the input phrase. If set to `0.0` the top N candidates are returned. The default is `1.0`. * `max_errors` - the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range `[0..1)` as a fraction of the actual query terms a number `>=1` as an absolut number of query terms. The default is set to `1.0` which corresponds to that only corrections with at most 1 misspelled term are returned. * `separator` - the separator that is used to separate terms in the bigram field. If not set the whitespce character is used as a separator. * `size` - the number of candidates that are generated for each individual query term Low numbers like `3` or `5` typically produce good results. Raising this can bring up terms with higher edit distances. The default is `5`. * `analyzer` - Sets the analyzer to analyse to suggest text with. Defaults to the search analyzer of the suggest field passed via `field`. * `shard_size` - Sets the maximum number of suggested term to be retrieved from each individual shard. During the reduce phase the only the top N suggestions are returned based on the `size` option. Defaults to `5`. * `text` - Sets the text / query to provide suggestions for. ## Smoothing Models The `phrase` suggester supports multiple smoothing models to balance weight between infrequent grams (grams (shingles) are not existing in the index) and frequent grams (appear at least once in the index). * `laplace` - the default model that uses an additive smoothing model where a constant (typically `1.0` or smaller) is added to all counts to balance weights, The default `alpha` is `0.5`. * `stupid_backoff` - a simple backoff model that backs off to lower order n-gram models if the higher order count is `0` and discounts the lower order n-gram model by a constant factor. The default `discount` is `0.4`. * `linear_interpolation` - a smoothing model that takes the weighted mean of the unigrams, bigrams and trigrams based on user supplied weights (lambdas). Linear Interpolation doesn't have any default values. All parameters (`trigram_lambda`, `bigram_lambda`, `unigram_lambda`) must be supplied. ## Candidate Generators The `phrase` suggester uses candidate generators to produce a list of possible terms per term in the given text. A single candidate generator is similar to a `term` suggester called for each individual term in the text. The output of the generators is subsequently scored in in combination with the candidates from the other terms to for suggestion candidates. Currently only one type of candidate generator is supported, the `direct_generator`. The Phrase suggest API accepts a list of generators under the key `direct_generator` each of the generators in the list are called per term in the original text. ## Direct Generators The direct generators support the following parameters: * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: * `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. * `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. * `always` - Suggest any matching suggestions based on terms in the suggest text. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option. * pre_filter - a filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. (optional) * post_filter - a filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. (optional) The following example shows a `phrase` suggest call with two generators, the first one is using a field containing ordinary indexed terms and the second one uses a field that uses terms indexed with a `reverse` filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The `pre_filter` and `post_filter` options accept ordinary analyzer names. ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 4, "real_word_error_likelihood" : 0.95, "confidence" : 2.0, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 }, { "field" : "reverse", "suggest_mode" : "always", "min_word_len" : 1, "pre_filter" : "reverse", "post_filter" : "reverse" } ] } } } } ``` `pre_filter` and `post_filter` can also be used to inject synonyms after candidates are generated. For instance for the query `captain usq` we might generate a candidate `usa` for term `usq` which is a synonym for `america` which allows to present `captain america` to the user if this phrase scores high enough. Closes #2709	2013-02-28 16:17:59 +01:00
Shay Banon	2bc624806d	not bytes...	2013-02-28 16:02:38 +01:00
Shay Banon	7400c30eba	fail a shard if a merge failure occurs	2013-02-27 23:44:55 +01:00
Shay Banon	e908c723f1	don't log merge failures twice	2013-02-27 20:23:40 +01:00
Simon Willnauer	7be8f431d5	move id tests into SimpleQueryTests	2013-02-27 19:03:42 +01:00
Simon Willnauer	8ab602ec81	Fix AIOOB exception in UID type/id tuple creation. Closes #2695	2013-02-27 18:58:27 +01:00
Shay Banon	3b2d403292	malformed elasticsearch.yml causes unresponsive hang fixes #2693	2013-02-27 18:58:08 +01:00
Drew Raines	cb7a569f4b	Include preference in _count serialization and builder. [#2698 ]	2013-02-27 08:15:02 -06:00
Martijn van Groningen	ffbdc0a4c3	Updated postings format jdocs	2013-02-27 10:46:55 +01:00
Drew Raines	b53a8aff6a	Allow _count to take preference parameter. [#2698 ]	2013-02-26 16:24:52 -06:00
Shay Banon	1e937fd5d1	Allow index: "no" for _type fixes #2696	2013-02-26 22:06:52 +01:00
Martijn van Groningen	7c53d22ce9	Moved resolveClosestNestedObjectMapper to MapperService	2013-02-26 17:48:02 +01:00
Igor Motov	de243493c9	Changing dynamic index and cluster settings should work on master-only nodes Fixes #2675	2013-02-26 08:54:46 -05:00
Shay Banon	bd75b731c6	move to 0.90.0.Beta2 snap	2013-02-26 10:33:57 +01:00
Shay Banon	ab3a59e0bf	release 0.90.0.Beta1	2013-02-26 10:32:50 +01:00
Martijn van Groningen	2b5e3f5586	Fixed resolving closest nested object when sorting on a field inside nested object	2013-02-25 16:21:22 +01:00
Martijn van Groningen	c751df5ee5	Removed unused nested children collector.	2013-02-25 14:13:59 +01:00
Shay Banon	c7a05b1dda	add helper method to know if ObjectMappers have a nested mapping	2013-02-25 13:40:05 +01:00
Shay Banon	6e3300efd3	better error message on nested sorting	2013-02-25 13:32:00 +01:00
Shay Banon	4bb4e49155	Empty list in ids query should not fail, but match no docs relates to #2687	2013-02-25 12:51:34 +01:00
Shay Banon	bde36647fb	Terms/Ids filter: Support empty list of values, resulting in no match for it closes #2687 also closes #2686	2013-02-25 12:26:49 +01:00
Shay Banon	4145d154bb	add a test for empty lookup terms filter	2013-02-25 11:58:58 +01:00
Shay Banon	10ca4d5305	move internal stream facet type lookup to work with bytes	2013-02-25 10:57:18 +01:00
Lukas Vlcek	a42f9491b5	fix typo in exception	2013-02-24 07:47:25 +01:00
Shay Banon	595e0e254e	[Code refactoring] IndicesStats -> IndicesStatsResponse fixes #1782	2013-02-23 14:23:36 +01:00
David Pilato	4c493ac71d	Revert changes on *Request classes from issue Relative to #2657	2013-02-23 10:37:56 +01:00
David Pilato	a646e126e9	Display list of all available site plugins on /_plugins/ end point fix #2664	2013-02-23 09:34:06 +01:00
Shay Banon	eea3a01765	only return 404 on actual index settings missing, on "_all", return 200 relates to #2676	2013-02-22 23:08:38 +01:00
Shay Banon	915019587d	Get settings on empty node fails with ArrayIndexOutOfBoundsException[0] fixes #2676	2013-02-22 23:08:33 +01:00
Igor Motov	b8cc8e56c4	Improve stability of SimpleRobinEngineTests	2013-02-22 14:59:49 -05:00
Shay Banon	ad70105c39	keep the rescorer builder consistent with other builders, without the use of setters	2013-02-22 14:06:39 +01:00
Shay Banon	03fdc6aa80	Query DSL: Terms filter to allow for terms lookup from another document closes #2674	2013-02-22 14:04:10 +01:00
Shay Banon	6978aa2189	mark source as "safe" when copying it over	2013-02-22 12:59:41 +01:00
Shay Banon	a234e45b59	fix boolean to is from get relates to #2657	2013-02-22 12:45:56 +01:00
Igor Motov	ec3492c67c	Improve stability of the testReusePeerRecovery test	2013-02-21 16:06:33 -05:00
Shay Banon	b7f5295674	update jsr166y adn jst166e to latest versions	2013-02-21 21:11:14 +01:00
Shay Banon	4753ffdf1e	allow to set which queue implementation to use expert setting, but still would be great to be able to control it	2013-02-21 20:07:40 +01:00
Ilya Nazarov	da3d682f0e	Check for java-6-openjdk-i386 in init.d There is check for /usr/lib/jvm/java-6-openjdk-amd64, but no for 32-bit systems (/usr/lib/jvm/java-6-openjdk-i386).	2013-02-21 21:13:51 +07:00
Igor Motov	4ea4de6f8d	Add logging information for releasing node lock	2013-02-20 17:53:27 -05:00
Shay Banon	7bb092440a	facet refactoring, default collector base post implementation automatically implement post based on collector	2013-02-20 15:36:11 +01:00
Igor Motov	ce6f0e27bf	Make file distribution among several disks configurable Fixes #2650	2013-02-19 21:43:43 -05:00
David Pilato	b7afa0f44e	Fix test for Support trailing slashes on plugin _site URLs #2654	2013-02-19 21:16:47 +01:00
Martijn van Groningen	3b31c1216e	Made the `term_vector` json field the leading way of configuring term vectors. Supported options: `no`, `yes`, `with_offsets`, `with_positions`, `with_positions_offsets` and`with_positions_offsets_payloads`.	2013-02-19 20:55:43 +01:00
Igor Motov	5b9e9a004a	Make sure that in SitePluginTests http client connects to the correct node and closes the node after the test	2013-02-19 14:42:24 -05:00
Igor Motov	f96c1f1e10	When a node is leaving LocalDiscovery cluster, rerouting should be performed on the master node	2013-02-19 13:14:33 -05:00
Igor Motov	d126558dec	Add check for health timeout to shardCleanup test	2013-02-19 13:12:26 -05:00
David Pilato	8ab9d2dd1f	Support trailing slashes on plugin _site URLs fix #2654	2013-02-19 09:21:45 +01:00
Igor Motov	cfaa859bb2	Improve stability of UpdateNumberOfReplicasTests	2013-02-18 20:12:39 -05:00
Igor Motov	4222478b18	Make it simpler to determine which version of state was used to calculate health	2013-02-18 20:02:29 -05:00
Igor Motov	5746c50ef9	Improve stability of shardsCleanup test	2013-02-18 19:35:12 -05:00
Igor Motov	183a74c866	Improve stability of testSimpleAwareness test	2013-02-18 19:31:07 -05:00
Martijn van Groningen	303e87fb69	Added support for sorting by fields inside one or more nested objects. The sorting by nested field support has the following parameters on top of the already existing sort options: nested_path - Defines the on what nested object to sort. The actual sort field must be a direct field inside this nested object. The default is to use the most immediate inherited nested object from the sort field. nested_filter - A filter the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting. Common case is to repeat the query / filter inside the nested filter or query. By default no nested_filter is active. Either the highest (max) or lowest (min) inner object is picked for during sorting depending on the sort_mode being used. The sort_mode options avg and sum can still be used for number based fields inside nested objects. All the values for the sort field are taken into account for each nested object. Closes #2662	2013-02-18 22:10:41 +01:00
Simon Willnauer	8db436f107	Remove backported Lucene 4 spatial code in favor of the released version in Lucene 4.1	2013-02-18 18:43:55 +01:00
Jeffrey Gerard	0dfc2169d7	Added Testcase and BugFix fixing #2626 where GeoShape intersects filter omitted matching docs. SpatialPrefixTree#recursiveGetNodes uses an optimization that prevents recursion into the deepest tree level if a parent node in the penultimate level covers all its children. This produces a bug if the optimization happens both at indexing and at query/filter time. This patch fixes the bug by disabling the optimization at indexing time (to avoid adding overhead for query-heavy workloads). See LUCENE-4770 for reference	2013-02-18 18:43:47 +01:00
David Pilato	cc83c2f848	refactoring getter/setters Fixes #2657	2013-02-18 11:09:32 -05:00
Martijn van Groningen	ac2e6a3a4d	Fixed nested facets with filters.	2013-02-18 11:01:18 -05:00
Simon Willnauer	24291d40f4	Expose CJKWidthTokenFilter and CJKBigramTokenFilter Closes #2660	2013-02-18 11:01:17 -05:00
Shay Banon	547bd7abf2	add our own bloom filter implementation uses more hash iterations, yet require less memory for the same fpp relates to #2411	2013-02-18 11:01:17 -05:00
Igor Motov	512585da82	Fix race condition in adding TimeoutClusterStateListener Fixes #2658	2013-02-18 11:01:17 -05:00
Shay Banon	435eabd4a0	allow to access the global node settings in a static manner	2013-02-18 11:01:17 -05:00
Shay Banon	e365ecce10	fix check on which settings to change on	2013-02-18 11:01:17 -05:00
Shay Banon	73a447da86	initial facet refactoring the main goal of the facet refactoring is to allow for two modes of facet execution, collector based, that get callbacks as hist match, and post based, which iterates over all the relevant hits it also includes a some simplification of the facet implementation	2013-02-16 02:25:04 +01:00
Shay Banon	06b82a45d4	Simplified range syntax when using a query string closes #2655	2013-02-15 01:30:55 +01:00
Shay Banon	4714a6acc9	Clear cache: allow to invalidate specific filter cache keys closes #2653	2013-02-14 21:13:19 +01:00
Shay Banon	c12c456192	add note on not using totalSize in merge	2013-02-14 14:30:46 +01:00
Shay Banon	e8e3dd1c9d	add 0.20.6 ver	2013-02-14 14:29:30 +01:00
Igor Motov	37f16127c5	Fix ScriptFilter cache key calculation Fixes #2651	2013-02-14 06:13:26 -05:00
Igor Motov	6b49457d9d	Optimize conversion to a cacheable DocIdSet	2013-02-13 21:04:54 -05:00
Shay Banon	883c593d7e	delay reroute only after we publish that a shard has started	2013-02-14 00:10:52 +01:00
Shay Banon	681239b413	Warmers do not load field data cache for sorting on new segments fixes #2649	2013-02-13 17:51:34 +01:00
Shay Banon	f41eccc7a5	updating non dynamic settings throws an error now	2013-02-13 14:28:16 +01:00
Martijn van Groningen	2193a8e401	Let the update index settings action fail if non dynamic settings are changed for open indices. Closes #2647	2013-02-13 13:10:56 +01:00
Shay Banon	5ad540a1aa	possibly incorrect use of Lucene OneMerge.totalBytesSize fixes #2643	2013-02-12 22:09:55 +01:00
Martijn van Groningen	3a2d40acd9	Added more trace logging related to finding master.	2013-02-12 21:40:12 +01:00
Shay Banon	5519f80abb	add increased timeout waiting for relocation when running on small boxes	2013-02-12 21:23:18 +01:00
Martijn van Groningen	fc13499ff5	Added `sort_mode` option that defines what value to pick in the case the sort field is multi-valued. The `min` and `max` sort modes are supported for all field types. Either the lowest value or the highest value is picked. In addition to that number based fields also support `sum` and `avg` as sort mode. If `sum` sort mode is used then all the values for a field and belonging to a document are added together and the result of that is used as sort value. If the `avg` sort mode is used then the average of all values for the sort field belonging to that document is used as sort value. Relates to #2634	2013-02-12 20:38:24 +01:00
Shay Banon	7d13545e33	delete indices before running the tests	2013-02-12 19:28:48 +01:00
Shay Banon	668bcd0eb7	Bulk execution while a shard is replication might send erroneous version conflict failures for certain items fixes #2642	2013-02-12 17:38:06 +01:00
Simon Willnauer	a7bbab7e87	# Rescore Feature The rescore feature allows te rescore a document returned by a query based on a secondary algorithm. Rescoring is commonly used if a scoring algorithm is too costly to be executed across the entire document set but efficient enough to be executed on the Top-K documents scored by a faster retrieval method. Rescoring can help to improve precision by reordering a larger Top-K window than actually returned to the user. Typically is it executed on a window between 100 and 500 documents while the actual result window requested by the user remains the same. # Query Rescorer The `query` rescorer executes a secondary query only on the Top-K results of the actual user query and rescores the documents based on a linear combination of the user query's score and the score of the `rescore_query`. This allows to execute any exposed query as a `rescore_query` and supports a `query_weight` as well as a `rescore_query_weight` to weight the factors of the linear combination. # Rescore API The `rescore` request is defined along side the query part in the json request: ```json curl -s -XPOST 'localhost:9200/_search' -d { "query" : { "match" : { "field1" : { "query" : "the quick brown", "type" : "boolean", "operator" : "OR" } } }, "rescore" : { "window_size" : 50, "query" : { "rescore_query" : { "match" : { "field1" : { "query" : "the quick brown", "type" : "phrase", "slop" : 2 } } }, "query_weight" : 0.7, "rescore_query_weight" : 1.2 } } } ``` Each `rescore` request is executed on a per-shard basis within the same roundtrip. Currently the rescore API has only one implementation (the `query` rescorer) which modifies the result set in-place. Future developments could include dedicated rescore results if needed by the implemenation ie. a pair-wise reranker. Note: Only regualr queries are rescored, if the search type is set to `scan` or `count` rescorers are not executed. Closes #2640	2013-02-12 17:10:00 +01:00
Shay Banon	c65aff7775	Index with no replicas might loose on going documents while relocating a shard fixes #26421	2013-02-12 17:03:28 +01:00
Martijn van Groningen	e54f010a4d	Also support camel case notation for minimal norwegian.	2013-02-12 16:39:11 +01:00
morsegel	ca7920a398	added norwegian minimal stemmer	2013-02-12 16:32:38 +01:00
Igor Motov	f98bd654a8	Fix filter cache stats calculation Fixes #2609	2013-02-11 10:28:53 -05:00
uboness	a2b87e28f6	fixed a bug in PrioritizedThreadPoolExecutor: now execute(Runnable) satisfies the priority and fifo nature of same-priority runnables	2013-02-09 04:20:16 +01:00
uboness	eef3610e12	fixed a bug in PrioritizedThreadPoolExecutor: now execute(Runnable) verifies the command is added as Comparable	2013-02-09 03:33:12 +01:00
uboness	678a8664f6	fixed a bug in PrioritizedThreadPoolExecutor: now execute(Runnable) verifies the command is added as PrioritizedRunnable	2013-02-09 03:26:52 +01:00
uboness	6d9048f8cc	added priority support for cluster state updates: * URGENT: * cluster_reroute (api) * refresh-mapping * cluster_update_settings * reroute_after_cluster_update_settings * create-index * delete-index * index-aliases * remove-index-template * create-index-template * update-mapping * remove-mapping * put-mapping * open-index * close-index * update-settings * HIGH * routing-table-updater * zen-disco-node_left * zen-disco-master_failed * shard-failed * shard-started * NORMAL * all other actions	2013-02-09 01:14:57 +01:00
Simon Willnauer	f5331c9535	Cleanup NumericFieldData. FieldData interfaces are reduced to long and double while internal represenations still operate on the actual datatypes.	2013-02-08 20:58:36 +01:00
Martijn van Groningen	1189a2c2c2	Extended mv sorting integration test	2013-02-08 15:24:56 +01:00
Martijn van Groningen	8c7779057c	Added sort by field that have multiple values per document. Closes #2634	2013-02-08 13:28:40 +01:00
Simon Willnauer	033d6e4306	don't use substraction for comparison if datatypes can overflow	2013-02-08 10:07:31 +01:00
Martijn van Groningen	f97021b165	Fixes size assertion failure.	2013-02-07 16:50:54 +01:00
Martijn van Groningen	e2cb7edb08	Added more info to assert	2013-02-07 13:52:25 +01:00
Martijn van Groningen	e72e323c8a	Attempt to fix "No active shards" failure	2013-02-07 10:14:10 +01:00
Lee Hinman	ed43ad07d7	Throw a more meaningful message when no document is specified for indexing	2013-02-06 22:33:02 +01:00
Florian Schilling	a52e01f3e5	Remove XTermsFilter and UidFilter in favour of Lucene 4.1 TermsFilter	2013-02-06 18:45:05 +01:00
Igor Motov	6890c9fa62	Move action.wait_on_mapping_change setting to pom	2013-02-06 11:48:58 -05:00
Igor Motov	ed09ba0a18	Improve stability of RecoveryPercolatorTests Without "action.wait_on_mapping_change" setting set to true, the test node might get shutdown before updated mapping is saved.	2013-02-05 14:53:46 -05:00
Igor Motov	8277833f8d	Fix settings processing in WordDelimiterTokenFilterFactory	2013-02-05 10:03:00 -05:00
Martijn van Groningen	19295280d9	Made sure that wrapped child query / parent query gets rewritten only once.	2013-02-05 10:27:31 +01:00
Igor Motov	9e89323ad2	Add proper cleanup to InternalSettingsPerparerTests	2013-02-04 19:58:40 -05:00
Martijn van Groningen	bc667c378e	Made SoftWrapper fields final.	2013-02-04 14:47:36 +01:00
Martijn van Groningen	8109d13733	Use CacheRecycler when resolving parent docs in TopChildrenQuery.	2013-02-04 12:46:30 +01:00
Martijn van Groningen	9c3a86875b	Removed `execution_type` for has_child and has_parent.	2013-02-04 11:37:40 +01:00
Igor Motov	20ce01bd53	Add additional query validation to the terms query parser Fixes #2608	2013-02-03 09:44:16 -05:00
Shay Banon	ebc0c8cc6d	when we fix maxMergeAtOnce, make sure to not set it to 1 as its an illegal value	2013-02-01 19:00:01 +01:00
Shay Banon	a8c9e580ed	add getMaxOrd, and properly document the difference between it and numOrds	2013-02-01 16:13:13 +01:00
Shay Banon	6f1932ab67	support yaml detection on char sequence	2013-02-01 12:46:19 +01:00
Simon Willnauer	6468c15446	check for == 0 rather than > 0	2013-02-01 11:11:47 +01:00
Simon Willnauer	c18ae4a194	fix getMemorySizeInBytes in SparseMultiArrayOrdinals	2013-02-01 11:09:09 +01:00
Igor Motov	45b2bff8da	Improve SearchStatsTests Added refresh to guarantee that at least something will be fetched on a fast computer.	2013-01-31 21:19:08 -05:00
Igor Motov	ca635deb36	Allow health to be executed on a local node instead of the master	2013-01-31 21:19:08 -05:00
Igor Motov	3c9541dd14	Make facet and sort tests more reliable in case of multiple nodes and shards Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.	2013-01-31 21:19:07 -05:00
Igor Motov	6a01e7882c	Improve shardsCleanup test When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.	2013-01-31 21:18:14 -05:00
Igor Motov	e32efba3d8	Improve RecoverAfterNodes tests	2013-01-31 20:05:55 -05:00
Martijn van Groningen	5e811e5382	Another small TopChildrenQuery cleanup.	2013-01-31 23:49:32 +01:00
Martijn van Groningen	7ef65688cd	- TopChildrenQuery cleanup. - Added class level jdocs for TopChildrenQuery and ChildrenQuery.	2013-01-31 23:38:09 +01:00
Simon Willnauer	1a1df06411	Move OrdsBuilding into a dedicated class and abstract integer pools used to build sparse ordinals	2013-01-31 19:02:31 +01:00
Martijn van Groningen	1f50b07406	Initial parent/child queries cleanup.	2013-01-31 18:39:31 +01:00
Martijn van Groningen	371b071fb7	Added notion of Rewrite that replaces ScopePhase	2013-01-31 17:24:46 +01:00
Martijn van Groningen	d4ef4697d5	Also remove scope from facet builders. Fixes build.	2013-01-31 16:34:45 +01:00
Martijn van Groningen	46dd42920c	Remove scope support in query and facet dsl. Remove support for the `scope` field in facets and `_scope` field in the nested and parent/child queries. The scope support for nested queries will be replaced by the `nested` facet option and a facet filter with a nested filter. The nested filters will now support the a `join` option. Which controls whether to perform the block join. By default this enabled, but when disabled it returns the nested documents as hits instead of the joined root document. Search request with the current scope support. ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } }, "_scope" : "my_scope" } }, "facets" : { "size" : { "terms" : { "field" : "offers.size" }, "scope" : "my_scope" } } }' ``` The following will be functional equivalent of using the scope support: ``` curl -s -XPOST 'localhost:9200/products/_search?search_type=count' -d '{ "query" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } } } }, "facets" : { "size" : { "terms" : { "field" : "offers.size" }, "facet_filter" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } }, "join" : false } }, "nested" : "offers" } } }' ``` The scope support for parent/child queries will be replaced by running the child query as filter in a global facet. Search request with the current scope support: ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "has_child" : { "type" : "offer", "query" : { "match" : { "color" : "blue" } }, "_scope" : "my_scope" } }, "facets" : { "size" : { "terms" : { "field" : "size" }, "scope" : "my_scope" } } }' ``` The following is the functional equivalent of using the scope support with parent/child queries: ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "has_child" : { "type" : "offer", "query" : { "match" : { "color" : "blue" } } } }, "facets" : { "size" : { "terms" : { "field" : "size" }, "global" : true, "facet_filter" : { "term" : { "color" : "blue" } } } } }' ``` Closes #2606	2013-01-31 15:09:57 +01:00
Martijn van Groningen	355381962b	Use only the 'test' index, instead of all indices for child search benchmark.	2013-01-31 13:12:33 +01:00
Shay Banon	6cec73c201	remove fuzzy factor from mapping (internally implemented) we want to support ~ notion in query parser for types other than strings, we are getting there, one can do now age:10~5, we would love to support it for dates, as in timestamp:2012-10-10~5d, but that requires changes in the query parser to support strings after the ~ sign	2013-01-31 12:23:03 +01:00
Igor Motov	8df7f2af0d	Improve testReusePeerRecovery test	2013-01-30 19:51:41 -05:00
Igor Motov	29f4274213	Add index cleanup if index creation fails Fixes #2590	2013-01-30 10:40:01 -05:00
Shay Banon	5c40c97e6e	Id Cache: Allow to configure if ids should be reused (memory wise) or not, default to false closes #2605	2013-01-30 14:42:07 +01:00
Martijn van Groningen	bc20f068c9	Made `search_analyzer` updateable via put mapping api. Closes #2604	2013-01-30 11:49:20 +01:00
Martijn van Groningen	e074e00f76	Fielddata: Moved the growing logic to IntArrayRef	2013-01-30 11:20:41 +01:00
Martijn van Groningen	f7692aeef2	Fielddata: IntArrayRef is initialized with small array and grows if needed	2013-01-30 10:57:52 +01:00
Simon Willnauer	5df37eaf75	add more advanced tests for phrase_prefix	2013-01-30 10:51:05 +01:00
Shay Banon	f5e55b7cb9	properly print JVM version	2013-01-29 20:25:13 +01:00
Shay Banon	0568284147	reduce the memory needed while building the sparse array ordinals	2013-01-29 20:23:54 +01:00
Shay Banon	716f2aebbb	add 0.20.5	2013-01-29 10:14:25 +01:00
Simon Willnauer	0697e2f23e	use index prefix in tests to prevent misconfiguration	2013-01-28 15:51:06 +01:00
Simon Willnauer	72a2416a8c	Support MultiPhrasePrefixQuery and MultiPhraseQuery in highlighters Closes #2596	2013-01-28 15:41:25 +01:00
Martijn van Groningen	2e68207d6d	Updated suggest api. # Suggest feature The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available from version `0.21.0`. # Fuzzy suggester The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request. # Suggest API The suggest request part is defined along side the query part as top field in the json request. ``` curl -s -XPOST 'localhost:9200/_search' -d '{ "query" : { ... }, "suggest" : { ... } }' ``` Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. Both `my-suggest-1` and `my-suggest-2` suggestions use the `fuzzy` suggester, but have a different `text`. ``` "suggest" : { "my-suggest-1" : { "text" : "the amsterdma meetpu", "fuzzy" : { "field" : "body" } }, "my-suggest-2" : { "text" : "the rottredam meetpu", "fuzzy" : { "field" : "title", } } } ``` The below suggest response example includes the suggestion response for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains entries. Each entry is effectively a token from the suggest text and contains the suggestion entry text, the original start offset and length in the suggest text and if found an arbitary number of options. ``` { ... "suggest": { "my-suggest-1": [ { "text" : "amsterdma", "offset": 4, "length": 9, "options": [ ... ] }, ... ], "my-suggest-2" : [ ... ] } ... } ``` Each options array contains a option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance. ``` "options": [ { "text": "amsterdam", "freq": 77, "score": 0.8888889 }, ... ] ``` # Global suggest text To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is defined globally and applies to the `my-suggest-1` and `my-suggest-2` suggestions. ``` "suggest" : { "text" : "the amsterdma meetpu" "my-suggest-1" : { "fuzzy" : { "field" : "title" } }, "my-suggest-2" : { "fuzzy" : { "field" : "body" } } } ``` The suggest text can in the above example also be specied as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level. # Other suggest example. In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase. ``` curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{ "suggest" : { "my-title-suggestions-1" : { "text" : "devloping distibutd saerch engies", "fuzzy" : { "size" : 3, "field" : "title" } } } }' ``` The above request could yield the response as stated in the code example below. As you can see if we take the first suggested options of each suggestion entry we get `developing distributed search engines` as result. ``` { ... "suggest": { "my-title-suggestions-1": [ { "text": "devloping", "offset": 0, "length": 9, "options": [ { "text": "developing", "freq": 77, "score": 0.8888889 }, { "text": "deloping", "freq": 1, "score": 0.875 }, { "text": "deploying", "freq": 2, "score": 0.7777778 } ] }, { "text": "distibutd", "offset": 10, "length": 9, "options": [ { "text": "distributed", "freq": 217, "score": 0.7777778 }, { "text": "disributed", "freq": 1, "score": 0.7777778 }, { "text": "distribute", "freq": 1, "score": 0.7777778 } ] }, { "text": "saerch", "offset": 20, "length": 6, "options": [ { "text": "search", "freq": 1038, "score": 0.8333333 }, { "text": "smerch", "freq": 3, "score": 0.8333333 }, { "text": "serch", "freq": 2, "score": 0.8 } ] }, { "text": "engies", "offset": 27, "length": 6, "options": [ { "text": "engines", "freq": 568, "score": 0.8333333 }, { "text": "engles", "freq": 3, "score": 0.8333333 }, { "text": "eggies", "freq": 1, "score": 0.8333333 } ] } ] } ... } ``` # Common suggest options: * `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion. # Common fuzzy suggest options * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value: `score` - Sort by sore first, then document frequency and then the term itself. `frequency` - Sort by document frequency first, then simlarity score and then the term itself. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. ** `always` - Suggest any matching suggestions based on terms in the suggest text. # Other fuzzy suggest options: * `lowercase_terms` - Lower cases the suggest text terms after text analyzation. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option.	2013-01-28 15:18:18 +01:00
Simon Willnauer	48488f707f	Expose CommonTermsQuery in Match & MultiMatch and enable highlighting Closes #2591	2013-01-28 11:57:05 +01:00
Shay Banon	bfdf8fe590	Indexes created from index request might not replica initial doc to replica fixes #2594	2013-01-28 11:29:32 +01:00
Shay Banon	9539661d40	move facet reduce from facet process to the actual facet this will simplify execution, and actually let the process just be a parser (rename will probably happen)	2013-01-27 13:45:38 +01:00
Shay Banon	360d7d9425	default for paged_bytes for string type less memory overhead, though a bit slower on the execution side for facets, and might require more memory per facet execution	2013-01-26 15:11:14 +01:00
Simon Willnauer	5c89d66216	move ShardsAllocatorModuleTests to o.e.t.integration	2013-01-25 22:26:30 +01:00
Shay Banon	41cfe9cc27	add 0.20.4	2013-01-25 22:02:34 +01:00
Shay Banon	042a5d02d9	Primary shard failure with initializing replica shards can cause the replica shard to cause allocation failures fixes #2592	2013-01-25 17:59:01 +01:00
Simon Willnauer	a7bb3c29f2	Propagate exception during recovery if segement info can not be opended but should	2013-01-25 15:25:48 +01:00
Shay Banon	1be84c273b	eagerly reroute when a node leaves the cluster	2013-01-25 15:23:05 +01:00
Martijn van Groningen	a1ef1f02cc	Exposed IndexOptions#DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS setting.	2013-01-25 00:02:43 +01:00
Shay Banon	45ed9ddba7	cleanup ordinals in field data	2013-01-24 22:31:52 +01:00
Shay Banon	990acff4f7	make sure we wait for yellow stats in suggest API when searching on clean index	2013-01-24 22:31:51 +01:00
Martijn van Groningen	f974a17229	Removed AbstractFragmentsBuilder. Lucene's BaseFragmentsBuilder has now discrete multivalued highlighting and better support for requesting large number of fragments.	2013-01-24 22:15:07 +01:00
Martijn van Groningen	e56b279624	Made BlockJoinScorer#freq() method handle freqs correctly (as is done in ToParentBlockJoinQuery)	2013-01-24 21:52:56 +01:00
Martijn van Groningen	9013eeae8a	Added filter support in the `has_child` and `has_parent` filters. Example: ``` curl -XPOST 'localhost:9200/_search' -d '{ "query": { "filtered_query": { "query": { "match": { "title": "distributed systems" } }, "filter": { "has_child": { "type": "tag", "filter": { "term": { "name": "book" } } } } } } }' ``` Closes #2585	2013-01-24 21:32:38 +01:00
Shay Banon	a39469a252	gather the field data that are changed (we will make use of that later)	2013-01-24 15:55:23 +01:00
Martijn van Groningen	98a674fc6e	Added suggest api. # Suggest feature The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available since version `0.21.0`. # Fuzzy suggester The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request. # Suggest API The suggest request part is defined along side the query part as top field in the json request. ``` curl -s -XPOST 'localhost:9200/_search' -d '{ "query" : { ... }, "suggest" : { ... } }' ``` Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. The `my-suggest-1` suggestion uses the `body` field and `my-suggest-2` uses the `title` field. The `type` field is a required field and defines what suggester to use for a suggestion. ``` "suggest" : { "suggestions" : { "my-suggest-1" : { "type" : "fuzzy", "field" : "body", "text" : "the amsterdma meetpu" }, "my-suggest-2" : { "type" : "fuzzy", "field" : "title", "text" : "the rottredam meetpu" } } } ``` The below suggest response example includes the suggestions part for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains a terms array, that contains all terms outputted by the analyzed suggest text. Each term object includes the term itself, the original start and end offset in the suggest text and if found an arbitary number of suggestions. ``` { ... "suggest": { "my-suggest-1": { "terms" : [ { "term" : "amsterdma", "start_offset": 5, "end_offset": 14, "suggestions": [ ... ] } ... ] }, "my-suggest-2" : { "terms" : [ ... ] } } ``` Each suggestions array contains a suggestion object that includes the suggested term, its document frequency and score compared to the suggest text term. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance. ``` "suggestions": [ { "term": "amsterdam", "frequency": 77, "score": 0.8888889 }, ... ] ``` # Global suggest text To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is a global option and applies to the `my-suggest-1` and `my-suggest-2` suggestions. ``` "suggest" : { "suggestions" : { "text" : "the amsterdma meetpu", "my-suggest-1" : { "type" : "fuzzy", "field" : "title" }, "my-suggest-2" : { "type" : "fuzzy", "field" : "body" } } } ``` The suggest text can be specied as global option or as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level. # Other suggest example. In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase. ``` curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{ "suggest" : { "suggestions" : { "my-title-suggestions" : { "suggester" : "fuzzy", "field" : "title", "text" : "devloping distibutd saerch engies", "size" : 3 } } } }' ``` The above request could yield the response as stated in the code example below. As you can see if we take the first suggested term of each suggest text term we get `developing distributed search engines` as result. ``` { ... "suggest": { "my-title-suggestions": { "terms": [ { "term": "devloping", "start_offset": 0, "end_offset": 9, "suggestions": [ { "term": "developing", "frequency": 77, "score": 0.8888889 }, { "term": "deloping", "frequency": 1, "score": 0.875 }, { "term": "deploying", "frequency": 2, "score": 0.7777778 } ] }, { "term": "distibutd", "start_offset": 10, "end_offset": 19, "suggestions": [ { "term": "distributed", "frequency": 217, "score": 0.7777778 }, { "term": "disributed", "frequency": 1, "score": 0.7777778 }, { "term": "distribute", "frequency": 1, "score": 0.7777778 } ] }, { "term": "saerch", "start_offset": 20, "end_offset": 26, "suggestions": [ { "term": "search", "frequency": 1038, "score": 0.8333333 }, { "term": "smerch", "frequency": 3, "score": 0.8333333 }, { "term": "serch", "frequency": 2, "score": 0.8 } ] }, { "term": "engies", "start_offset": 27, "end_offset": 33, "suggestions": [ { "term": "engines", "frequency": 568, "score": 0.8333333 }, { "term": "engles", "frequency": 3, "score": 0.8333333 }, { "term": "eggies", "frequency": 1, "score": 0.8333333 } ] } ] } } ... } ``` # Common suggest options: * `suggester` - The suggester implementation type. The only supported value is 'fuzzy'. This is a required option. * `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion. # Common fuzzy suggest options * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value: `score` - Sort by sore first, then document frequency and then the term itself. `frequency` - Sort by document frequency first, then simlarity score and then the term itself. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. ** `always` - Suggest any matching suggestions based on terms in the suggest text. # Other fuzzy suggest options: * `lowercase_terms` - Lower cases the suggest text terms after text analyzation. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option. Closes #2585	2013-01-24 15:41:06 +01:00
Shay Banon	9673a1c366	expose field data settings in mapping, they can be updated using merge mapping	2013-01-24 15:33:24 +01:00
Simon Willnauer	4eefcb9c82	Expose CommonTermsQuery Closes #2583	2013-01-24 14:18:01 +01:00
Simon Willnauer	c4eab90b2e	Cleanup MatchQuery	2013-01-24 14:11:56 +01:00
Shay Banon	c2f35621f6	allow to get settings as delimited string	2013-01-24 12:03:16 +01:00
Shay Banon	b143822bac	allow to load settings from delimited string	2013-01-24 12:00:14 +01:00
Simon Willnauer	88f68264c7	Reuse MemoryIndex instances across Percolator requests. * added configurable MemoryIndexPool that pools MemoryIndex instance across Threads * Pool can be configured based on the number of pooled instances as well as the maximum number of bytes that is reused across the pooled instances Closes #2581	2013-01-24 11:53:21 +01:00
Shay Banon	e8c1180ede	add field data stats	2013-01-24 11:38:18 +01:00
Shay Banon	613b746299	move field data type to simply be type and settings	2013-01-24 09:33:16 +01:00
Martijn van Groningen	50ac477d92	Fixed small bug. Index name should be used to lookup entry.	2013-01-23 23:53:20 +01:00
Shay Banon	4967a97faf	don't use private since its accessed from inner class, remove $$ need	2013-01-23 22:17:27 +01:00
Martijn van Groningen	346422b747	Added sparse multi ordinals implementation for field data.	2013-01-23 22:11:31 +01:00
Daniel Muller	9e79f54cb1	Check for java-6-openjdk-amd64	2013-01-23 18:34:37 +01:00
synhershko	e0f711a94a	Updating Lucene version	2013-01-23 16:18:18 +02:00
Shay Banon	a74e7f8099	refactor geo to extract common classes	2013-01-23 14:14:21 +01:00
Simon Willnauer	9c729fad2c	remove flush check IW#commit always adds a commit point now even if nothing has changed ie. docs are added, updated or deleted.	2013-01-23 14:06:01 +01:00
Shay Banon	22f0e79a84	use merge trigger to control when to do merges now with merge trigger, we can simply decide when to do merges based on it	2013-01-23 13:24:20 +01:00
Shay Banon	d969e61999	Remove settings option for index store compression, compression is always enabled closes #2577	2013-01-23 13:11:48 +01:00
Simon Willnauer	2880cd0172	Upgrade to Lucene 4.1 * Removed CustmoMemoryIndex in favor of MemoryIndex which as of 4.1 supports adding the same field twice * Replaced duplicated logic in X[]FSDirectory for rate limiting with a RateLimitedFSDirectory wrapper Remove hacks to find out merge context in rate limiting in favor of IOContext * replaced Scorer#freq() return type (from float to int) * Upgraded FVHighlighter to new 'centered' highlighting * Fixed RobinEngine to use seperate setCommitData	2013-01-23 11:54:11 +01:00
Shay Banon	20f43bf54c	add hasSingleArrayBackingStorage allow for optimization only when there really is a single array, and not when there is a multi dimensional one	2013-01-23 10:24:43 +01:00
Igor Motov	bbfd3957eb	Improve stability of the testNodesInfos test	2013-01-22 12:29:38 -05:00
Igor Motov	9becdb814a	Improve stability of the shardsCleanup test	2013-01-22 10:20:18 -05:00
Shay Banon	c295211a85	final move to new field data	2013-01-22 16:16:33 +01:00
Shay Banon	27bfb341ff	better logging on missing format, and allow to configure format on a type on the index level	2013-01-22 16:16:33 +01:00
uboness	09cc70b8c9	added predefined empty implementation for all atomic field datas	2013-01-22 16:16:33 +01:00
Shay Banon	6b92b592b4	allow to clear by reader the new field data cache	2013-01-22 16:16:32 +01:00
Shay Banon	c67386f644	properly invalidate on core closed reader	2013-01-22 16:16:32 +01:00
Shay Banon	af757fd821	more usage of field data note, removed field data from cache stats, it will have its own stats later on (cache part is really misleading)	2013-01-22 16:16:32 +01:00
Shay Banon	de013babf8	move geo filters and numeric range to use new field data	2013-01-22 16:16:32 +01:00
Shay Banon	be1e5becbb	move scripts to use new field data	2013-01-22 16:16:32 +01:00
Shay Banon	772ee9db54	move terms to use new field data	2013-01-22 16:16:32 +01:00
Shay Banon	e5b651321f	remove some safe methods because of the new makeSafe method usage	2013-01-22 16:16:32 +01:00
Shay Banon	f189a832c5	grr pages -> paged	2013-01-22 16:16:32 +01:00
Shay Banon	5b7173fc35	move sorting to work with new field data	2013-01-22 16:16:32 +01:00
uboness	b739bf97d4	added missing dedicated value comparators for the different indices field data	2013-01-22 16:16:32 +01:00
Shay Banon	45f27fe96a	add packed bytes variant for strings/bytes	2013-01-22 16:16:32 +01:00
uboness	855b64a8a7	byte field data implementation	2013-01-22 16:16:31 +01:00
uboness	f1f3c241fd	short field data implementation	2013-01-22 16:16:31 +01:00
uboness	3840439365	float field data implementation	2013-01-22 16:16:31 +01:00
Shay Banon	9137fcc6fc	move geo distance sorting to use new field data	2013-01-22 16:16:31 +01:00
Shay Banon	d5e70a27df	integer type to support int field data type	2013-01-22 16:16:31 +01:00
uboness	fc09ce7ac9	Implemented int field data	2013-01-22 16:16:31 +01:00
Shay Banon	d82859c82b	geo point new field mapper with geo distance facet based impl	2013-01-22 16:16:31 +01:00
Shay Banon	2e86081f7b	use smartNameMapper on context	2013-01-22 16:16:31 +01:00
Shay Banon	d88e3f73ac	add specific makeSafe method to make an unsafe (shared) bytes based value to a "safe" one	2013-01-22 16:16:31 +01:00
Shay Banon	1765b0b813	date histogram to use new field data	2013-01-22 16:16:31 +01:00
Shay Banon	37acba1b57	terms stats to use new field data	2013-01-22 16:16:31 +01:00
Shay Banon	f1f86efed5	move statistical facet to use new field data	2013-01-22 16:16:30 +01:00
Shay Banon	699ff2782e	move histogram facet to use new field data	2013-01-22 16:16:30 +01:00
Shay Banon	8c7e0f5ca1	fix getOrds on single array ords	2013-01-22 16:16:30 +01:00
Shay Banon	fa363b2dca	move range facet to use new field data abstraction	2013-01-22 16:16:30 +01:00
Shay Banon	692413862a	add clear when deleting an index for the field data service	2013-01-22 16:16:30 +01:00
Shay Banon	a39ca58de9	add field data service to index level services	2013-01-22 16:16:30 +01:00
Shay Banon	2d91939253	add initial field data type support to mappers hardwired and still happily leaves with current field data impl	2013-01-22 16:16:30 +01:00
Shay Banon	e0b280f9b3	use FieldMapper.Names for fieldNames, and not just fieldName as string	2013-01-22 16:16:30 +01:00
Shay Banon	7dc5cf9799	add long field support	2013-01-22 16:16:30 +01:00
Shay Banon	7397007e05	initial commit	2013-01-22 16:16:30 +01:00
Clinton Gormley	7cfdd9ef59	Corrected filter strategy option in FilteredQueryParser Changed from 'query_filter' to 'query_first'	2013-01-22 12:54:00 +01:00
Simon Willnauer	0b730aae81	Pass on filterStrategy in XFilteredQuery if query is rewritten	2013-01-22 12:40:21 +01:00
Martijn van Groningen	a5bd57ed6c	Added trace log statement, to catch stacktraces	2013-01-20 23:17:18 +01:00
Simon Willnauer	35cf9ee11d	wait for cluster to be formed in SimpleNodesInfoTests	2013-01-19 15:44:26 +01:00
Simon Willnauer	d6b613ac8c	Respect lowercase_expanded_terms in MappingQueryParser Fixes #2566	2013-01-19 13:57:45 +01:00
Simon Willnauer	31fd521fd1	provide more information if a null DocumentMapper is returned	2013-01-18 16:43:56 +01:00
Simon Willnauer	c563248f76	testMoreLikeThisIssue2197 should create index mapping first to prevent races	2013-01-18 16:41:37 +01:00
Simon Willnauer	6f38a3a8a8	create index and mapping first to ensure all relevant nodes see the mapping	2013-01-18 16:09:24 +01:00
Simon Willnauer	393de984bd	Remove deprecated StreamInput/Output#read/writeUTF	2013-01-17 22:38:42 +01:00
Simon Willnauer	d37c844da0	use camelcase for getters	2013-01-17 22:27:44 +01:00
Simon Willnauer	3d80c53192	Allow ShardsAllocator to be configured via node level settings. * Default ShardsAllocator is set to BalancedShardsAllocator * Core ShardsAllocator implementations can be defined via 'cluster.routing.allocation.type' * Core ShardsAllocator implementations are exposed via short keys 'balanced' (BalancedShardsAllocator) and 'even_shards' (EvenShardsCountAllocator) * Third party allocators can be loaded via fully-qualified class names. Closes #2557	2013-01-17 16:23:52 +01:00
Simon Willnauer	2eb09e6b1a	Added BalancedShardsAllocator that balances shards based on a weight function. * Weights are calculated per index and incorporate index level, global and primary related parameters * Balance operations are executed based on a win maximation strategy that tries to relocate shards first that offer the biggest gain towards the weight functions optimum * The WeightFunction allows settings to prefer index based balance over global balance and vice versa * Balance operations can be throttled by raising a threshold resulting in less agressive balance operations * WeightFunction shipps with defaults to achive evenly distributed indexes while maintaining a global balance Closes #2555	2013-01-17 12:02:42 +01:00
Igor Motov	d97839b8a8	Fix char filter issues introduced during lucene 4 migration Fixes #2543	2013-01-14 12:43:02 -05:00
Igor Motov	e82f96f1e5	Make script cache configurable and bounded Fixes #2539	2013-01-14 06:57:13 -05:00
Igor Motov	6243f8e64d	Disallow unknown custom indexing parameters Fixes #2354	2013-01-11 10:14:25 -05:00
Martijn van Groningen	1ce10dfb06	Fixed issue where parent & child queries can fail if a segment doesn't have documents with the targeted type or associated parent type Closes #2537	2013-01-11 16:06:14 +01:00
Martijn van Groningen	43aabe88e8	Fixed document already exists error when concurrently sending update request with upsert using the same id. Closes #2530	2013-01-10 14:25:44 +01:00
Shay Banon	6f7253c524	Comments are not allowed in mapping checked jackson, there won't be an overhead in enabling comments. Added, with the caveat that when used with mappings, and calling "get mapping", the comments will not be returned closes #1394	2013-01-07 04:21:41 +01:00
Shay Banon	2c4b9d9ba2	cleanup queryHint since its not was never used preference ended up as the way to control routing	2013-01-07 04:02:45 +01:00
Shay Banon	bcdda811ef	add read/write optional text	2013-01-07 02:54:22 +01:00

... 3 4 5 6 7 ...

1565 Commits