OpenSearch

Commit Graph

Author	SHA1	Message	Date
Shay Banon	566d1d13f7	fix javadoc	2013-03-18 22:04:31 +01:00
Clinton Gormley	2123ab591c	Correct filter strategy opt: random_access_random to random_access_always	2013-03-18 20:17:26 +01:00
Shay Banon	7d9cef904b	Field Data: optimize long type to use narrowest possible type automatically closes #2795	2013-03-18 12:37:15 +01:00
Shay Banon	82072fc47f	make ES compile with java 8 - that isAnnotationPresent bug is known, and probably will be fixed in later versions, but it costs us nothing to not use it now - some tests fail, mainly due to consistent ordering expected from Map (within versions) which does not seem to be preserved, need to fix those tests to be agnostic to it	2013-03-18 01:33:09 +01:00
Shay Banon	e347a626da	use ImmutableList.Builder instead of ArrayList	2013-03-17 21:55:07 +01:00
Shay Banon	2ed6ea25cc	fix logging message to include the index also add the list of current indices	2013-03-16 22:58:45 +01:00
Shay Banon	111a13222e	Mapping: dynamic flag is explicitly returned even when not set fixes #2789	2013-03-16 01:29:22 +01:00
Simon Willnauer	c25eb7defe	Fix bug in RateLimiter.SimpleRateLimiter causing numeric overflow in StoreStats Closes #2785	2013-03-15 23:36:31 +01:00
Shay Banon	d5da8f22ff	improve TODO comment	2013-03-15 21:46:02 +01:00
Simon Willnauer	0e3b88be35	add CamelCase support to Suggester where missing	2013-03-15 15:07:15 +01:00
Simon Willnauer	e0eff7d9d3	Remove `sort_order` and `sort_mode` in favor of `order` and `mode` Closes #2781	2013-03-15 13:57:39 +01:00
Simon Willnauer	33608c333f	Add `sort_oder` and `sortOrder` as valid field names for defining the sort order in a Sort object. Closes #2767	2013-03-15 08:42:19 +01:00
Simon Willnauer	5f20d81199	Make StupidBackoff the default smoothing model for phrase suggester Closes #2780	2013-03-14 23:03:15 +01:00
Shay Banon	91c51ef05c	minor cleanup suggest api - make sure we close the parser - fail when no content is provided in the rest request - reuse the suggest parse element	2013-03-13 12:18:14 -07:00
Florian Schilling	25bd9cecd0	# REST Suggester API The REST Suggester API binds the 'Suggest API' to the REST Layer directly. Hence there is no need to touch the query layer for requesting suggestions. This API extracts the Phrase Suggester API and makes 'suggestion request' top-level objects in suggestion requests. The complete API can be found in the underlying ["Suggest Feature API"](http://www.elasticsearch.org/guide/reference/api/search/suggest.html). # API Example The following examples show how Suggest Actions work on the REST layer. According to this a simple request and its response will be shown. ## Suggestion Request ```json curl -XPOST 'localhost:9200/_suggest?pretty=true' -d '{ "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "bigram", "field" : "bigram", "size" : 1, "real_word_error_likelihood" : 0.95, "max_errors" : 0.5, "gram_size" : 2 } } }' ``` This example shows how to query a suggestion for the global text 'Xor the Got-Jewel'. A 'simple phrase' suggestion is requested and a 'direct generator' is configured to generate the candidates. ## Suggestion Response On success the request above will reply with a response like the following: ```json { "simple_phrase" : [ { "text" : "Xor the Got-Jewel", "offset" : 0, "length" : 17, "options" : [ { "text" : "xorr the the got got jewel", "score" : 3.5283546E-4 } ] } ] } ``` The 'suggest'-response contains a single 'simple phrase' which contains an 'option' in turn. This option represents a suggestion of the queried text. It contains the corrected text and a score indicating the probability of this option to be meant. Closes #2774	2013-03-13 19:36:29 +01:00
Jörg Prante	a127f2d2e8	avoiding NPE in Sigar FS	2013-03-13 10:05:59 -07:00
Alexander Reelsen	125b33d3dc	GeoJSONShapeParser parses JSON correctly and extracts coordinates even if 'crs' field is included. Fixes #2763	2013-03-13 15:17:21 +01:00
Simon Willnauer	365cde82d3	Use numOrds rather than numDocs as upperbound for sorting Closes #2773	2013-03-13 15:13:56 +01:00
Clinton Gormley	93ca6e2c4b	tieBreaker in MultiMatchQueryBuilder should be a float, not an integer Closes #2772	2013-03-13 13:44:59 +01:00
Shay Banon	5ed9fb2c54	support also mode in search sorting, and fail on illegal parameters	2013-03-12 15:16:25 -07:00
Shay Banon	55ceb01c44	force close connection if its on a connect failure relates to Repeated ConnectExceptions in logs until node is restarted, fixes #2766	2013-03-12 14:49:07 -07:00
Shay Banon	877105ee19	no need for specific time / empty based classes, just as final fields	2013-03-12 12:32:05 -07:00
Igor Motov	3a534c64e5	Add dynamic settings validation Fixes #2749	2013-03-12 14:41:00 -04:00
Simon Willnauer	c008c59927	add missing license header	2013-03-12 14:57:47 +01:00
Simon Willnauer	c5395436e6	fix test bug where a small time window exists that can trigger a false failure due to default concurrent recoveries	2013-03-12 14:48:14 +01:00
Simon Willnauer	237c4ddf54	Introdue ParentIdCollector that collects only if the parent ID is non null ie. if the document has a parent. Closes #2744	2013-03-11 21:11:05 +01:00
Clinton Gormley	7961dfa7ab	Fixed a typo in an error message "should exists" -> "should exist"	2013-03-11 15:37:20 +01:00
Simon Willnauer	9442c41481	enable testcase that relied on a Lucene 4.2 fix	2013-03-11 12:57:24 +01:00
Simon Willnauer	4e7cff488e	add test that ensures that we bumb the version on a Lucene Upgrade	2013-03-11 10:30:48 +01:00
Simon Willnauer	ebadd9ebbd	Fix tests since Lucene 4.2 we can support date math in Fuzzy-Search Syntax	2013-03-11 08:23:01 +01:00
Simon Willnauer	a37f1f55cc	Add tests for highlighting boost query. Closes #1314	2013-03-11 08:23:01 +01:00
Simon Willnauer	11bf7a8b1a	Upgrade to Lucene 4.2	2013-03-11 08:23:01 +01:00
Simon Willnauer	75fd6d4985	Added KeywordRepeatFilter that allows emitting stemmed and unstemmed versions of the same token if stemmers are used Closes #2753	2013-03-09 23:09:59 +01:00
Simon Willnauer	dc9a052287	Respect CandidateGenerator#size if set in the request and reduce the total #of candidates to the shard size. Closes #2752	2013-03-09 13:36:40 +01:00
Shay Banon	cc6c07365c	has_child query AVG score mode does not always work correctly fixes #2750	2013-03-08 08:50:11 -08:00
Shay Banon	eb956e7c09	Term/Terms filters on numeric fields gives wrong result fixes #2746	2013-03-07 22:12:22 -08:00
Shay Banon	c298c19177	don't use cache for ordinals for small max ord	2013-03-07 08:45:01 -08:00
Simon Willnauer	2c8d8ef8e0	check for null on setters taht must not be null in IndicesReplicationOperationRequest	2013-03-07 10:18:10 +01:00
Benjamin Devèze	35f5ca915d	Add support for ignore_indices to delete by query Closes #2734	2013-03-07 10:17:51 +01:00
Simon Willnauer	12a2808168	exhaust object to allow subsequent objects to be parsed correctly	2013-03-06 15:34:59 +01:00
Simon Willnauer	1f217f6a7b	Move smoothing model into its own sub-object in the PhraseSuggest request Closes #2735	2013-03-06 14:31:21 +01:00
Shay Banon	e1409a9f0e	Problems with range searches for time with lte fixes #2731	2013-03-05 18:10:30 -08:00
Shay Banon	9a25867bfe	Network: A closed channel might not always fire up a close event fixes #2733	2013-03-05 11:49:10 -08:00
Igor Motov	acff102234	Implement search shards API Closes #2726	2013-03-05 09:17:59 -05:00
Simon Willnauer	1eb24d7efc	use a base ShingleFilterFactory to simplify default shingle detection	2013-03-05 12:32:50 +01:00
Simon Willnauer	0f95499703	if word scorer is on unigram make sure we score the current position not position 0	2013-03-05 12:31:32 +01:00
Simon Willnauer	876b5a3dcd	prefer totalTermFrequency over docFreq in PhraseSuggester	2013-03-05 10:46:25 +01:00
Simon Willnauer	315744be55	Set shardSize according to the total size if not explicitly specified. Closes #2729	2013-03-05 09:22:23 +01:00
Shay Banon	3e264f6b95	cleanup deletion of content in shards we are very conservative on when we delete data, remove the actual options of deleting data that are not used	2013-03-04 20:41:19 -08:00
Shay Banon	1ed07c1794	add a list of files that exists in the index to the failure	2013-03-04 18:15:06 -08:00
Shay Banon	d609571897	add close method to field data	2013-03-04 16:42:29 -08:00
Shay Banon	cfd8bddde4	Remove JMX connector creation flags, and JMX attributes closes #2728	2013-03-04 16:12:18 -08:00
Shay Banon	774622abfb	Change field data stats header from `field_data` to `fielddata`. fixes #2727	2013-03-04 23:50:33 +01:00
Shay Banon	d2dc672f43	allow to specify a list of settings to get a value for	2013-03-04 23:41:43 +01:00
Drew Raines	a8d52b58b6	Remove obsolete test.	2013-03-04 15:22:40 -06:00
Andrii Gakhov	dc28151ad7	fixed interchanged values in field_data stats fixes #2724	2013-03-04 11:19:33 +01:00
Shay Banon	a1b2434339	revert change on listing plugins on /_plugin we should provide it as part of nodes info relates to #2664	2013-03-03 21:52:44 +01:00
Shay Banon	a7da27c714	Field Data: Add `node` level cache type closes #2722	2013-03-03 19:55:06 +01:00
Shay Banon	e01879a698	add evictions stats to field data	2013-03-03 18:41:17 +01:00
Simon Willnauer	e9ba98913b	simplify searchShard selection when routing is present	2013-03-03 14:32:19 +01:00
Benjamin Devèze	09f20e3d4c	Fix bug when searching concrete and routing aliased indices Closes #2683	2013-03-03 14:31:57 +01:00
uboness	881cb7900c	Change geo_shapes support: * Exposed the spatial strategy to be configurable as part of the geo_shape mappings * Exposed the spatial strategy to be customizable at query time (will be used to generate the geo_shape filter/query) * Removed XTermQueryPrefixTreeStrategy and reverted to use the lucene TermQueryPrefixTreeStrategy instead * Made the RecursivePrefixTreeStrategy the default strategy to be used * Removed support for all spatial operations except "intersects" * Updated both the GeoShapeQueryBuilder and GeoShapeFilterBuilder with all the changes (removed the option of specifying the operation type (as only intersects is supported) and added the option of setting the filter/query spatial strategy Closes #2720	2013-03-02 17:13:58 +01:00
Simon Willnauer	b9513511e0	Check for null query on Percolator query loading and omit the query if it can't be parsed. Closes #2547	2013-03-02 16:55:39 +01:00
Shay Banon	0be5a7888f	fix local flag in cluster health	2013-03-02 16:00:10 +01:00
Shay Banon	5dd18acd0e	proper reason for cluster state task	2013-03-02 15:48:01 +01:00
Shay Banon	50d121315b	add ability for cluster health to wait for current events to be processed help with tests that run on slow machines	2013-03-02 14:25:45 +01:00
tristanbuckner	9273d76cdf	Make BoolFilterBuilder output proper json	2013-03-02 01:07:50 +01:00
Shay Banon	ea097afd91	add proper testing for bool filter	2013-03-02 01:07:05 +01:00
Shay Banon	361d6bf89a	spin a bit to wait for condition in test, so slow machines will still run it correctly	2013-03-01 23:36:13 +01:00
Shay Banon	fe8b3725bb	lazy set the indices on the search request now that its validated	2013-03-01 22:45:59 +01:00
Shay Banon	6687ecb038	Query DSL: Filtered query to make query optional (defaults to mach_all) closes #2718	2013-03-01 22:40:22 +01:00
Matt Weber	dfd92265b7	Correct order of routing and parent params for Get The order in which routing and parent parameters are set is important. The routing parameter must be set first or it will overwrite the parent routing value.	2013-03-01 22:24:14 +01:00
Shay Banon	2eea99255d	Analyze API returns in YAML format if analyzed string begins with --- fixes #2624	2013-03-01 22:17:09 +01:00
Shay Banon	9b68e98ea2	more strict check before trying to parse and detect a string as a date fixes #2694	2013-03-01 22:15:32 +01:00
Jeremy Jongsma	d16efbe47f	Throw correct ClassNotFoundException to debug classloader issues	2013-03-01 21:56:59 +01:00
Simon Willnauer	aaa3c48b3c	Throw IAE if indices is null or contains a null value. Closes #2656	2013-03-01 21:26:23 +01:00
Simon Willnauer	fced68c22d	ensure that suggestion only added on reduce if they are present in the shard response	2013-03-01 21:09:10 +01:00
Martijn van Groningen	d99b532f0f	Supporting sort modes `avg` and `sum` when sorting inside nested objects. Previously this commit either sort modes `min` or `max` (depending on sort order) was used when sort modes `avg` and `sum` were picked. Closes #2701	2013-03-01 19:53:20 +01:00
Simon Willnauer	39f362326e	Short Curcuit response if no indices exits and make sure listener is notified. Closes #2692	2013-03-01 15:15:56 +01:00
Simon Willnauer	3c1f291801	Fail in metadata parsing if the id path is not a value but rather an array or an object. Closes #2275	2013-03-01 13:00:29 +01:00
Simon Willnauer	b03f3fcd6c	throw IAE if fieldname is null - Closes #2711	2013-03-01 12:10:07 +01:00
Simon Willnauer	9c3898900d	always use the max score across the shards in suggest response	2013-03-01 12:09:29 +01:00
Shay Banon	30075bb6f9	add info in test for actual search failures	2013-03-01 00:00:09 +01:00
Shay Banon	849a3677cd	improve timing in test to wait for state with graceful timeouts (yet, validate early and exit when relevant)	2013-02-28 23:44:52 +01:00
Simon Willnauer	c90c5cbf85	fix bug in StupidBackoffScorer were previous word and current word were flipped creating non-existing bigram	2013-02-28 21:23:41 +01:00
Simon Willnauer	b4b3e350a6	Expose _explain via POST Closes #2710	2013-02-28 18:19:08 +01:00
Simon Willnauer	d4ec03ed76	# Phrase Suggester The `term` suggester provides a very convenient API to access word alternatives on token basis within a certain string distance. The API allows accessing each token in the stream individually while suggest-selection is left to the API consumer. Yet, often already ranked / selected suggestions are required in order to present to the end-user. Inside ElasticSearch we have the ability to access way more statistics and information quickly to make better decision which token alternative to pick or if to pick an alternative at all. This `phrase` suggester adds some logic on top of the `term` suggester to select entire corrected phrases instead of individual tokens weighted based on a ngram-langugage models. In practice it will be able to make better decision about which tokens to pick based on co-occurence and frequencies. The current implementation is kept quite general and leaves room for future improvements. # API Example The `phrase` request is defined along side the query part in the json request: ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 1, "real_word_error_likelihood" : 0.95, "max_errors" : 0.5, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 } ] } } } } ``` The response contains suggested sored by the most likely spell correction first. In this case we got the expected correction `xorr the god jewel` first while the second correction is less conservative where only one of the errors is corrected. Note, the request is executed with `max_errors` set to `0.5` so 50% of the terms can contain misspellings (See parameter descriptions below). ```json { "took" : 37, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2938, "max_score" : 0.0, "hits" : [ ] }, "suggest" : { "simple_phrase" : [ { "text" : "Xor the Got-Jewel", "offset" : 0, "length" : 17, "options" : [ { "text" : "xorr the god jewel", "score" : 0.17877324 }, { "text" : "xor the god jewel", "score" : 0.14231323 } ] } ] } } ```` # Phrase suggest API ## Basic parameters * `field` - the name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections. * `gram_size` - sets max size of the n-grams (shingles) in the `field`. If the field doesn't contain n-grams (shingles) this should be omitted or set to `1`. * `real_word_error_likelihood` - the likelihood of a term being a misspelled even if the term exists in the dictionary. The default it `0.95` corresponding to 5% or the real words are misspelled. * `confidence` - The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggest candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of `1.0` will only return suggestions that score higher than the input phrase. If set to `0.0` the top N candidates are returned. The default is `1.0`. * `max_errors` - the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range `[0..1)` as a fraction of the actual query terms a number `>=1` as an absolut number of query terms. The default is set to `1.0` which corresponds to that only corrections with at most 1 misspelled term are returned. * `separator` - the separator that is used to separate terms in the bigram field. If not set the whitespce character is used as a separator. * `size` - the number of candidates that are generated for each individual query term Low numbers like `3` or `5` typically produce good results. Raising this can bring up terms with higher edit distances. The default is `5`. * `analyzer` - Sets the analyzer to analyse to suggest text with. Defaults to the search analyzer of the suggest field passed via `field`. * `shard_size` - Sets the maximum number of suggested term to be retrieved from each individual shard. During the reduce phase the only the top N suggestions are returned based on the `size` option. Defaults to `5`. * `text` - Sets the text / query to provide suggestions for. ## Smoothing Models The `phrase` suggester supports multiple smoothing models to balance weight between infrequent grams (grams (shingles) are not existing in the index) and frequent grams (appear at least once in the index). * `laplace` - the default model that uses an additive smoothing model where a constant (typically `1.0` or smaller) is added to all counts to balance weights, The default `alpha` is `0.5`. * `stupid_backoff` - a simple backoff model that backs off to lower order n-gram models if the higher order count is `0` and discounts the lower order n-gram model by a constant factor. The default `discount` is `0.4`. * `linear_interpolation` - a smoothing model that takes the weighted mean of the unigrams, bigrams and trigrams based on user supplied weights (lambdas). Linear Interpolation doesn't have any default values. All parameters (`trigram_lambda`, `bigram_lambda`, `unigram_lambda`) must be supplied. ## Candidate Generators The `phrase` suggester uses candidate generators to produce a list of possible terms per term in the given text. A single candidate generator is similar to a `term` suggester called for each individual term in the text. The output of the generators is subsequently scored in in combination with the candidates from the other terms to for suggestion candidates. Currently only one type of candidate generator is supported, the `direct_generator`. The Phrase suggest API accepts a list of generators under the key `direct_generator` each of the generators in the list are called per term in the original text. ## Direct Generators The direct generators support the following parameters: * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: * `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. * `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. * `always` - Suggest any matching suggestions based on terms in the suggest text. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option. * pre_filter - a filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. (optional) * post_filter - a filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. (optional) The following example shows a `phrase` suggest call with two generators, the first one is using a field containing ordinary indexed terms and the second one uses a field that uses terms indexed with a `reverse` filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The `pre_filter` and `post_filter` options accept ordinary analyzer names. ```json curl -s -XPOST 'localhost:9200/_search' -d { "suggest" : { "text" : "Xor the Got-Jewel", "simple_phrase" : { "phrase" : { "analyzer" : "body", "field" : "bigram", "size" : 4, "real_word_error_likelihood" : 0.95, "confidence" : 2.0, "gram_size" : 2, "direct_generator" : [ { "field" : "body", "suggest_mode" : "always", "min_word_len" : 1 }, { "field" : "reverse", "suggest_mode" : "always", "min_word_len" : 1, "pre_filter" : "reverse", "post_filter" : "reverse" } ] } } } } ``` `pre_filter` and `post_filter` can also be used to inject synonyms after candidates are generated. For instance for the query `captain usq` we might generate a candidate `usa` for term `usq` which is a synonym for `america` which allows to present `captain america` to the user if this phrase scores high enough. Closes #2709	2013-02-28 16:17:59 +01:00
Shay Banon	2bc624806d	not bytes...	2013-02-28 16:02:38 +01:00
Shay Banon	7400c30eba	fail a shard if a merge failure occurs	2013-02-27 23:44:55 +01:00
Shay Banon	e908c723f1	don't log merge failures twice	2013-02-27 20:23:40 +01:00
Simon Willnauer	7be8f431d5	move id tests into SimpleQueryTests	2013-02-27 19:03:42 +01:00
Simon Willnauer	8ab602ec81	Fix AIOOB exception in UID type/id tuple creation. Closes #2695	2013-02-27 18:58:27 +01:00
Shay Banon	3b2d403292	malformed elasticsearch.yml causes unresponsive hang fixes #2693	2013-02-27 18:58:08 +01:00
Drew Raines	cb7a569f4b	Include preference in _count serialization and builder. [#2698 ]	2013-02-27 08:15:02 -06:00
Martijn van Groningen	ffbdc0a4c3	Updated postings format jdocs	2013-02-27 10:46:55 +01:00
Drew Raines	b53a8aff6a	Allow _count to take preference parameter. [#2698 ]	2013-02-26 16:24:52 -06:00
Shay Banon	1e937fd5d1	Allow index: "no" for _type fixes #2696	2013-02-26 22:06:52 +01:00
Martijn van Groningen	7c53d22ce9	Moved resolveClosestNestedObjectMapper to MapperService	2013-02-26 17:48:02 +01:00
Igor Motov	de243493c9	Changing dynamic index and cluster settings should work on master-only nodes Fixes #2675	2013-02-26 08:54:46 -05:00
Shay Banon	bd75b731c6	move to 0.90.0.Beta2 snap	2013-02-26 10:33:57 +01:00
Shay Banon	ab3a59e0bf	release 0.90.0.Beta1	2013-02-26 10:32:50 +01:00
Martijn van Groningen	2b5e3f5586	Fixed resolving closest nested object when sorting on a field inside nested object	2013-02-25 16:21:22 +01:00
Martijn van Groningen	c751df5ee5	Removed unused nested children collector.	2013-02-25 14:13:59 +01:00
Shay Banon	c7a05b1dda	add helper method to know if ObjectMappers have a nested mapping	2013-02-25 13:40:05 +01:00
Shay Banon	6e3300efd3	better error message on nested sorting	2013-02-25 13:32:00 +01:00
Shay Banon	4bb4e49155	Empty list in ids query should not fail, but match no docs relates to #2687	2013-02-25 12:51:34 +01:00
Shay Banon	bde36647fb	Terms/Ids filter: Support empty list of values, resulting in no match for it closes #2687 also closes #2686	2013-02-25 12:26:49 +01:00
Shay Banon	4145d154bb	add a test for empty lookup terms filter	2013-02-25 11:58:58 +01:00
Shay Banon	10ca4d5305	move internal stream facet type lookup to work with bytes	2013-02-25 10:57:18 +01:00
Lukas Vlcek	a42f9491b5	fix typo in exception	2013-02-24 07:47:25 +01:00
Shay Banon	595e0e254e	[Code refactoring] IndicesStats -> IndicesStatsResponse fixes #1782	2013-02-23 14:23:36 +01:00
David Pilato	4c493ac71d	Revert changes on *Request classes from issue Relative to #2657	2013-02-23 10:37:56 +01:00
David Pilato	a646e126e9	Display list of all available site plugins on /_plugins/ end point fix #2664	2013-02-23 09:34:06 +01:00
Shay Banon	eea3a01765	only return 404 on actual index settings missing, on "_all", return 200 relates to #2676	2013-02-22 23:08:38 +01:00
Shay Banon	915019587d	Get settings on empty node fails with ArrayIndexOutOfBoundsException[0] fixes #2676	2013-02-22 23:08:33 +01:00
Igor Motov	b8cc8e56c4	Improve stability of SimpleRobinEngineTests	2013-02-22 14:59:49 -05:00
Shay Banon	ad70105c39	keep the rescorer builder consistent with other builders, without the use of setters	2013-02-22 14:06:39 +01:00
Shay Banon	03fdc6aa80	Query DSL: Terms filter to allow for terms lookup from another document closes #2674	2013-02-22 14:04:10 +01:00
Shay Banon	6978aa2189	mark source as "safe" when copying it over	2013-02-22 12:59:41 +01:00
Shay Banon	a234e45b59	fix boolean to is from get relates to #2657	2013-02-22 12:45:56 +01:00
Igor Motov	ec3492c67c	Improve stability of the testReusePeerRecovery test	2013-02-21 16:06:33 -05:00
Shay Banon	b7f5295674	update jsr166y adn jst166e to latest versions	2013-02-21 21:11:14 +01:00
Shay Banon	4753ffdf1e	allow to set which queue implementation to use expert setting, but still would be great to be able to control it	2013-02-21 20:07:40 +01:00
Ilya Nazarov	da3d682f0e	Check for java-6-openjdk-i386 in init.d There is check for /usr/lib/jvm/java-6-openjdk-amd64, but no for 32-bit systems (/usr/lib/jvm/java-6-openjdk-i386).	2013-02-21 21:13:51 +07:00
Igor Motov	4ea4de6f8d	Add logging information for releasing node lock	2013-02-20 17:53:27 -05:00
Shay Banon	7bb092440a	facet refactoring, default collector base post implementation automatically implement post based on collector	2013-02-20 15:36:11 +01:00
Igor Motov	ce6f0e27bf	Make file distribution among several disks configurable Fixes #2650	2013-02-19 21:43:43 -05:00
David Pilato	b7afa0f44e	Fix test for Support trailing slashes on plugin _site URLs #2654	2013-02-19 21:16:47 +01:00
Martijn van Groningen	3b31c1216e	Made the `term_vector` json field the leading way of configuring term vectors. Supported options: `no`, `yes`, `with_offsets`, `with_positions`, `with_positions_offsets` and`with_positions_offsets_payloads`.	2013-02-19 20:55:43 +01:00
Igor Motov	5b9e9a004a	Make sure that in SitePluginTests http client connects to the correct node and closes the node after the test	2013-02-19 14:42:24 -05:00
Igor Motov	f96c1f1e10	When a node is leaving LocalDiscovery cluster, rerouting should be performed on the master node	2013-02-19 13:14:33 -05:00
Igor Motov	d126558dec	Add check for health timeout to shardCleanup test	2013-02-19 13:12:26 -05:00
David Pilato	8ab9d2dd1f	Support trailing slashes on plugin _site URLs fix #2654	2013-02-19 09:21:45 +01:00
Igor Motov	cfaa859bb2	Improve stability of UpdateNumberOfReplicasTests	2013-02-18 20:12:39 -05:00
Igor Motov	4222478b18	Make it simpler to determine which version of state was used to calculate health	2013-02-18 20:02:29 -05:00
Igor Motov	5746c50ef9	Improve stability of shardsCleanup test	2013-02-18 19:35:12 -05:00
Igor Motov	183a74c866	Improve stability of testSimpleAwareness test	2013-02-18 19:31:07 -05:00
Martijn van Groningen	303e87fb69	Added support for sorting by fields inside one or more nested objects. The sorting by nested field support has the following parameters on top of the already existing sort options: nested_path - Defines the on what nested object to sort. The actual sort field must be a direct field inside this nested object. The default is to use the most immediate inherited nested object from the sort field. nested_filter - A filter the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting. Common case is to repeat the query / filter inside the nested filter or query. By default no nested_filter is active. Either the highest (max) or lowest (min) inner object is picked for during sorting depending on the sort_mode being used. The sort_mode options avg and sum can still be used for number based fields inside nested objects. All the values for the sort field are taken into account for each nested object. Closes #2662	2013-02-18 22:10:41 +01:00
Simon Willnauer	8db436f107	Remove backported Lucene 4 spatial code in favor of the released version in Lucene 4.1	2013-02-18 18:43:55 +01:00
Jeffrey Gerard	0dfc2169d7	Added Testcase and BugFix fixing #2626 where GeoShape intersects filter omitted matching docs. SpatialPrefixTree#recursiveGetNodes uses an optimization that prevents recursion into the deepest tree level if a parent node in the penultimate level covers all its children. This produces a bug if the optimization happens both at indexing and at query/filter time. This patch fixes the bug by disabling the optimization at indexing time (to avoid adding overhead for query-heavy workloads). See LUCENE-4770 for reference	2013-02-18 18:43:47 +01:00
David Pilato	cc83c2f848	refactoring getter/setters Fixes #2657	2013-02-18 11:09:32 -05:00
Martijn van Groningen	ac2e6a3a4d	Fixed nested facets with filters.	2013-02-18 11:01:18 -05:00
Simon Willnauer	24291d40f4	Expose CJKWidthTokenFilter and CJKBigramTokenFilter Closes #2660	2013-02-18 11:01:17 -05:00
Shay Banon	547bd7abf2	add our own bloom filter implementation uses more hash iterations, yet require less memory for the same fpp relates to #2411	2013-02-18 11:01:17 -05:00
Igor Motov	512585da82	Fix race condition in adding TimeoutClusterStateListener Fixes #2658	2013-02-18 11:01:17 -05:00
Shay Banon	435eabd4a0	allow to access the global node settings in a static manner	2013-02-18 11:01:17 -05:00
Shay Banon	e365ecce10	fix check on which settings to change on	2013-02-18 11:01:17 -05:00
Shay Banon	73a447da86	initial facet refactoring the main goal of the facet refactoring is to allow for two modes of facet execution, collector based, that get callbacks as hist match, and post based, which iterates over all the relevant hits it also includes a some simplification of the facet implementation	2013-02-16 02:25:04 +01:00
Shay Banon	06b82a45d4	Simplified range syntax when using a query string closes #2655	2013-02-15 01:30:55 +01:00
Shay Banon	4714a6acc9	Clear cache: allow to invalidate specific filter cache keys closes #2653	2013-02-14 21:13:19 +01:00
Shay Banon	c12c456192	add note on not using totalSize in merge	2013-02-14 14:30:46 +01:00
Shay Banon	e8e3dd1c9d	add 0.20.6 ver	2013-02-14 14:29:30 +01:00
Igor Motov	37f16127c5	Fix ScriptFilter cache key calculation Fixes #2651	2013-02-14 06:13:26 -05:00
Igor Motov	6b49457d9d	Optimize conversion to a cacheable DocIdSet	2013-02-13 21:04:54 -05:00
Shay Banon	883c593d7e	delay reroute only after we publish that a shard has started	2013-02-14 00:10:52 +01:00
Shay Banon	681239b413	Warmers do not load field data cache for sorting on new segments fixes #2649	2013-02-13 17:51:34 +01:00
Shay Banon	f41eccc7a5	updating non dynamic settings throws an error now	2013-02-13 14:28:16 +01:00
Martijn van Groningen	2193a8e401	Let the update index settings action fail if non dynamic settings are changed for open indices. Closes #2647	2013-02-13 13:10:56 +01:00
Shay Banon	5ad540a1aa	possibly incorrect use of Lucene OneMerge.totalBytesSize fixes #2643	2013-02-12 22:09:55 +01:00
Martijn van Groningen	3a2d40acd9	Added more trace logging related to finding master.	2013-02-12 21:40:12 +01:00
Shay Banon	5519f80abb	add increased timeout waiting for relocation when running on small boxes	2013-02-12 21:23:18 +01:00
Martijn van Groningen	fc13499ff5	Added `sort_mode` option that defines what value to pick in the case the sort field is multi-valued. The `min` and `max` sort modes are supported for all field types. Either the lowest value or the highest value is picked. In addition to that number based fields also support `sum` and `avg` as sort mode. If `sum` sort mode is used then all the values for a field and belonging to a document are added together and the result of that is used as sort value. If the `avg` sort mode is used then the average of all values for the sort field belonging to that document is used as sort value. Relates to #2634	2013-02-12 20:38:24 +01:00
Shay Banon	7d13545e33	delete indices before running the tests	2013-02-12 19:28:48 +01:00
Shay Banon	668bcd0eb7	Bulk execution while a shard is replication might send erroneous version conflict failures for certain items fixes #2642	2013-02-12 17:38:06 +01:00
Simon Willnauer	a7bbab7e87	# Rescore Feature The rescore feature allows te rescore a document returned by a query based on a secondary algorithm. Rescoring is commonly used if a scoring algorithm is too costly to be executed across the entire document set but efficient enough to be executed on the Top-K documents scored by a faster retrieval method. Rescoring can help to improve precision by reordering a larger Top-K window than actually returned to the user. Typically is it executed on a window between 100 and 500 documents while the actual result window requested by the user remains the same. # Query Rescorer The `query` rescorer executes a secondary query only on the Top-K results of the actual user query and rescores the documents based on a linear combination of the user query's score and the score of the `rescore_query`. This allows to execute any exposed query as a `rescore_query` and supports a `query_weight` as well as a `rescore_query_weight` to weight the factors of the linear combination. # Rescore API The `rescore` request is defined along side the query part in the json request: ```json curl -s -XPOST 'localhost:9200/_search' -d { "query" : { "match" : { "field1" : { "query" : "the quick brown", "type" : "boolean", "operator" : "OR" } } }, "rescore" : { "window_size" : 50, "query" : { "rescore_query" : { "match" : { "field1" : { "query" : "the quick brown", "type" : "phrase", "slop" : 2 } } }, "query_weight" : 0.7, "rescore_query_weight" : 1.2 } } } ``` Each `rescore` request is executed on a per-shard basis within the same roundtrip. Currently the rescore API has only one implementation (the `query` rescorer) which modifies the result set in-place. Future developments could include dedicated rescore results if needed by the implemenation ie. a pair-wise reranker. Note: Only regualr queries are rescored, if the search type is set to `scan` or `count` rescorers are not executed. Closes #2640	2013-02-12 17:10:00 +01:00
Shay Banon	c65aff7775	Index with no replicas might loose on going documents while relocating a shard fixes #26421	2013-02-12 17:03:28 +01:00
Martijn van Groningen	e54f010a4d	Also support camel case notation for minimal norwegian.	2013-02-12 16:39:11 +01:00
morsegel	ca7920a398	added norwegian minimal stemmer	2013-02-12 16:32:38 +01:00
Igor Motov	f98bd654a8	Fix filter cache stats calculation Fixes #2609	2013-02-11 10:28:53 -05:00
uboness	a2b87e28f6	fixed a bug in PrioritizedThreadPoolExecutor: now execute(Runnable) satisfies the priority and fifo nature of same-priority runnables	2013-02-09 04:20:16 +01:00
uboness	eef3610e12	fixed a bug in PrioritizedThreadPoolExecutor: now execute(Runnable) verifies the command is added as Comparable	2013-02-09 03:33:12 +01:00
uboness	678a8664f6	fixed a bug in PrioritizedThreadPoolExecutor: now execute(Runnable) verifies the command is added as PrioritizedRunnable	2013-02-09 03:26:52 +01:00
uboness	6d9048f8cc	added priority support for cluster state updates: * URGENT: * cluster_reroute (api) * refresh-mapping * cluster_update_settings * reroute_after_cluster_update_settings * create-index * delete-index * index-aliases * remove-index-template * create-index-template * update-mapping * remove-mapping * put-mapping * open-index * close-index * update-settings * HIGH * routing-table-updater * zen-disco-node_left * zen-disco-master_failed * shard-failed * shard-started * NORMAL * all other actions	2013-02-09 01:14:57 +01:00
Simon Willnauer	f5331c9535	Cleanup NumericFieldData. FieldData interfaces are reduced to long and double while internal represenations still operate on the actual datatypes.	2013-02-08 20:58:36 +01:00
Martijn van Groningen	1189a2c2c2	Extended mv sorting integration test	2013-02-08 15:24:56 +01:00
Martijn van Groningen	8c7779057c	Added sort by field that have multiple values per document. Closes #2634	2013-02-08 13:28:40 +01:00
Simon Willnauer	033d6e4306	don't use substraction for comparison if datatypes can overflow	2013-02-08 10:07:31 +01:00
Martijn van Groningen	f97021b165	Fixes size assertion failure.	2013-02-07 16:50:54 +01:00
Martijn van Groningen	e2cb7edb08	Added more info to assert	2013-02-07 13:52:25 +01:00
Martijn van Groningen	e72e323c8a	Attempt to fix "No active shards" failure	2013-02-07 10:14:10 +01:00
Lee Hinman	ed43ad07d7	Throw a more meaningful message when no document is specified for indexing	2013-02-06 22:33:02 +01:00
Florian Schilling	a52e01f3e5	Remove XTermsFilter and UidFilter in favour of Lucene 4.1 TermsFilter	2013-02-06 18:45:05 +01:00
Igor Motov	6890c9fa62	Move action.wait_on_mapping_change setting to pom	2013-02-06 11:48:58 -05:00
Igor Motov	ed09ba0a18	Improve stability of RecoveryPercolatorTests Without "action.wait_on_mapping_change" setting set to true, the test node might get shutdown before updated mapping is saved.	2013-02-05 14:53:46 -05:00
Igor Motov	8277833f8d	Fix settings processing in WordDelimiterTokenFilterFactory	2013-02-05 10:03:00 -05:00
Martijn van Groningen	19295280d9	Made sure that wrapped child query / parent query gets rewritten only once.	2013-02-05 10:27:31 +01:00
Igor Motov	9e89323ad2	Add proper cleanup to InternalSettingsPerparerTests	2013-02-04 19:58:40 -05:00
Martijn van Groningen	bc667c378e	Made SoftWrapper fields final.	2013-02-04 14:47:36 +01:00
Martijn van Groningen	8109d13733	Use CacheRecycler when resolving parent docs in TopChildrenQuery.	2013-02-04 12:46:30 +01:00
Martijn van Groningen	9c3a86875b	Removed `execution_type` for has_child and has_parent.	2013-02-04 11:37:40 +01:00
Igor Motov	20ce01bd53	Add additional query validation to the terms query parser Fixes #2608	2013-02-03 09:44:16 -05:00
Shay Banon	ebc0c8cc6d	when we fix maxMergeAtOnce, make sure to not set it to 1 as its an illegal value	2013-02-01 19:00:01 +01:00
Shay Banon	a8c9e580ed	add getMaxOrd, and properly document the difference between it and numOrds	2013-02-01 16:13:13 +01:00
Shay Banon	6f1932ab67	support yaml detection on char sequence	2013-02-01 12:46:19 +01:00
Simon Willnauer	6468c15446	check for == 0 rather than > 0	2013-02-01 11:11:47 +01:00
Simon Willnauer	c18ae4a194	fix getMemorySizeInBytes in SparseMultiArrayOrdinals	2013-02-01 11:09:09 +01:00
Igor Motov	45b2bff8da	Improve SearchStatsTests Added refresh to guarantee that at least something will be fetched on a fast computer.	2013-01-31 21:19:08 -05:00
Igor Motov	ca635deb36	Allow health to be executed on a local node instead of the master	2013-01-31 21:19:08 -05:00
Igor Motov	3c9541dd14	Make facet and sort tests more reliable in case of multiple nodes and shards Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.	2013-01-31 21:19:07 -05:00
Igor Motov	6a01e7882c	Improve shardsCleanup test When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.	2013-01-31 21:18:14 -05:00

... 2 3 4 5 6 ...

1581 Commits