Commit Graph

1565 Commits

Author SHA1 Message Date
tristanbuckner 9273d76cdf Make BoolFilterBuilder output proper json 2013-03-02 01:07:50 +01:00
Shay Banon ea097afd91 add proper testing for bool filter 2013-03-02 01:07:05 +01:00
Shay Banon 361d6bf89a spin a bit to wait for condition in test, so slow machines will still run it correctly 2013-03-01 23:36:13 +01:00
Shay Banon fe8b3725bb lazy set the indices on the search request now that its validated 2013-03-01 22:45:59 +01:00
Shay Banon 6687ecb038 Query DSL: Filtered query to make query optional (defaults to mach_all)
closes #2718
2013-03-01 22:40:22 +01:00
Matt Weber dfd92265b7 Correct order of routing and parent params for Get
The order in which routing and parent parameters are set is important.  The
routing parameter must be set first or it will overwrite the parent routing
value.
2013-03-01 22:24:14 +01:00
Shay Banon 2eea99255d Analyze API returns in YAML format if analyzed string begins with ---
fixes #2624
2013-03-01 22:17:09 +01:00
Shay Banon 9b68e98ea2 more strict check before trying to parse and detect a string as a date
fixes #2694
2013-03-01 22:15:32 +01:00
Jeremy Jongsma d16efbe47f Throw correct ClassNotFoundException to debug classloader issues 2013-03-01 21:56:59 +01:00
Simon Willnauer aaa3c48b3c Throw IAE if indices is null or contains a null value.
Closes #2656
2013-03-01 21:26:23 +01:00
Simon Willnauer fced68c22d ensure that suggestion only added on reduce if they are present in the shard response 2013-03-01 21:09:10 +01:00
Martijn van Groningen d99b532f0f Supporting sort modes `avg` and `sum` when sorting inside nested objects.
Previously this commit either sort modes `min` or `max` (depending on sort order) was used when sort modes `avg` and `sum` were picked.

Closes #2701
2013-03-01 19:53:20 +01:00
Simon Willnauer 39f362326e Short Curcuit response if no indices exits and make sure listener is notified.
Closes #2692
2013-03-01 15:15:56 +01:00
Simon Willnauer 3c1f291801 Fail in metadata parsing if the id path is not a value but rather an array or an object.
Closes #2275
2013-03-01 13:00:29 +01:00
Simon Willnauer b03f3fcd6c throw IAE if fieldname is null - Closes #2711 2013-03-01 12:10:07 +01:00
Simon Willnauer 9c3898900d always use the max score across the shards in suggest response 2013-03-01 12:09:29 +01:00
Shay Banon 30075bb6f9 add info in test for actual search failures 2013-03-01 00:00:09 +01:00
Shay Banon 849a3677cd improve timing in test to wait for state with graceful timeouts
(yet, validate early and exit when relevant)
2013-02-28 23:44:52 +01:00
Simon Willnauer c90c5cbf85 fix bug in StupidBackoffScorer were previous word and current word were flipped creating non-existing bigram 2013-02-28 21:23:41 +01:00
Simon Willnauer b4b3e350a6 Expose _explain via POST
Closes #2710
2013-02-28 18:19:08 +01:00
Simon Willnauer d4ec03ed76 # Phrase Suggester
The `term` suggester provides a very convenient API to access word alternatives on token
basis within a certain string distance. The API allows accessing each token in the stream
individually while suggest-selection is left to the API consumer. Yet, often already ranked
/ selected suggestions are required in order to present to the end-user.
Inside ElasticSearch we have the ability to access way more statistics and information quickly
to make better decision which token alternative to pick or if to pick an alternative at all.

This `phrase` suggester adds some logic on top of the `term` suggester to select entire
corrected phrases instead of individual tokens weighted based on a *ngram-langugage models*. In practice it
will be able to make better decision about which tokens to pick based on co-occurence and frequencies.
The current implementation is kept quite general and leaves room for future improvements.

# API Example

The `phrase` request is defined along side the query part in the json request:

```json
curl -s -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 1,
        "real_word_error_likelihood" : 0.95,
        "max_errors" : 0.5,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "body",
          "suggest_mode" : "always",
          "min_word_len" : 1
        } ]
      }
    }
  }
}
```

The response contains suggested sored by the most likely spell correction first. In this case we got the expected correction
`xorr the god jewel` first while the second correction is less conservative where only one of the errors is corrected. Note, the request
is executed with `max_errors` set to `0.5` so 50% of the terms can contain misspellings (See parameter descriptions below).

```json
  {
  "took" : 37,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2938,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "simple_phrase" : [ {
      "text" : "Xor the Got-Jewel",
      "offset" : 0,
      "length" : 17,
      "options" : [ {
        "text" : "xorr the god jewel",
        "score" : 0.17877324
      }, {
        "text" : "xor the god jewel",
        "score" : 0.14231323
      } ]
    } ]
  }
}
````

# Phrase suggest API

## Basic parameters

* `field` - the name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections.
* `gram_size` - sets max size of the n-grams (shingles) in the `field`. If the field doesn't contain n-grams (shingles) this should be omitted or set to `1`.
* `real_word_error_likelihood` - the likelihood of a term being a misspelled even if the term exists in the dictionary. The default it `0.95` corresponding to 5% or the real words are misspelled.
* `confidence` - The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggest candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of `1.0` will only return suggestions that score higher than the input phrase. If set to `0.0` the top N candidates are returned. The default is `1.0`.
* `max_errors` - the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range `[0..1)` as a fraction of the actual query terms a number `>=1` as an absolut number of query terms. The default is set to `1.0` which corresponds to that only corrections with at most 1 misspelled term are returned.
* `separator` - the separator that is used to separate terms in the bigram field. If not set the whitespce character is used as a separator.
* `size` - the number of candidates that are generated for each individual query term Low numbers like `3` or `5` typically produce good results. Raising this can bring up terms with higher edit distances. The default is `5`.
* `analyzer` -  Sets the analyzer to analyse to suggest text with. Defaults to the search analyzer of the suggest field passed via `field`.
* `shard_size` - Sets the maximum number of suggested term to be retrieved from each individual shard. During the reduce phase the only the top N suggestions are returned based on the `size` option. Defaults to `5`.
* `text` - Sets the text / query to provide suggestions for.

## Smoothing Models
The `phrase` suggester supports multiple smoothing models to balance weight between infrequent grams (grams (shingles) are not existing in the index) and frequent grams (appear at least once in the index).
* `laplace` - the default model that uses an additive smoothing model where a constant (typically `1.0` or smaller) is added to all counts to balance weights, The default `alpha` is `0.5`.
* `stupid_backoff` - a simple backoff model that backs off to lower order n-gram models if the higher order count is `0` and discounts the lower order n-gram model by a constant factor. The default `discount` is `0.4`.
* `linear_interpolation` - a smoothing model that takes the weighted mean of the unigrams, bigrams and trigrams based on user supplied weights (lambdas). Linear Interpolation doesn't have any default values. All parameters (`trigram_lambda`, `bigram_lambda`, `unigram_lambda`) must be supplied.

## Candidate Generators
The `phrase` suggester uses candidate generators to produce a list of possible terms per term in the given text. A single candidate generator is similar to a `term` suggester called for each individual term in the text. The output of the generators is subsequently scored in in combination with the candidates from the other terms to for suggestion candidates.
Currently only one type of candidate generator is supported, the `direct_generator`. The Phrase suggest API accepts a list of generators under the key `direct_generator` each of the generators in the list are called per term in the original text.

## Direct Generators

The direct generators support the following parameters:

* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
 * `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
 * `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
 * `always` - Suggest any matching suggestions based on terms in the suggest text.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.
* pre_filter -  a filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. (optional)
* post_filter - a filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. (optional)

The following example shows a `phrase` suggest call with two generators, the first one is using a field containing ordinary indexed terms and the second one uses a field that uses
terms indexed with a `reverse` filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The `pre_filter` and `post_filter` options accept ordinary analyzer names.

```json
curl -s -XPOST 'localhost:9200/_search' -d {
 "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 4,
        "real_word_error_likelihood" : 0.95,
        "confidence" : 2.0,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "body",
          "suggest_mode" : "always",
          "min_word_len" : 1
        }, {
          "field" : "reverse",
          "suggest_mode" : "always",
          "min_word_len" : 1,
          "pre_filter" : "reverse",
          "post_filter" : "reverse"
        } ]
      }
    }
  }
}
```

`pre_filter` and `post_filter` can also be used to inject synonyms after candidates are generated. For instance for the query `captain usq` we might generate a candidate `usa` for term `usq` which is a synonym for `america` which allows to present `captain america` to the user if this phrase scores high enough.

Closes #2709
2013-02-28 16:17:59 +01:00
Shay Banon 2bc624806d not bytes... 2013-02-28 16:02:38 +01:00
Shay Banon 7400c30eba fail a shard if a merge failure occurs 2013-02-27 23:44:55 +01:00
Shay Banon e908c723f1 don't log merge failures twice 2013-02-27 20:23:40 +01:00
Simon Willnauer 7be8f431d5 move id tests into SimpleQueryTests 2013-02-27 19:03:42 +01:00
Simon Willnauer 8ab602ec81 Fix AIOOB exception in UID type/id tuple creation.
Closes #2695
2013-02-27 18:58:27 +01:00
Shay Banon 3b2d403292 malformed elasticsearch.yml causes unresponsive hang
fixes #2693
2013-02-27 18:58:08 +01:00
Drew Raines cb7a569f4b Include preference in _count serialization and builder. [#2698] 2013-02-27 08:15:02 -06:00
Martijn van Groningen ffbdc0a4c3 Updated postings format jdocs 2013-02-27 10:46:55 +01:00
Drew Raines b53a8aff6a Allow _count to take preference parameter. [#2698] 2013-02-26 16:24:52 -06:00
Shay Banon 1e937fd5d1 Allow index: "no" for _type
fixes #2696
2013-02-26 22:06:52 +01:00
Martijn van Groningen 7c53d22ce9 Moved resolveClosestNestedObjectMapper to MapperService 2013-02-26 17:48:02 +01:00
Igor Motov de243493c9 Changing dynamic index and cluster settings should work on master-only nodes
Fixes #2675
2013-02-26 08:54:46 -05:00
Shay Banon bd75b731c6 move to 0.90.0.Beta2 snap 2013-02-26 10:33:57 +01:00
Shay Banon ab3a59e0bf release 0.90.0.Beta1 2013-02-26 10:32:50 +01:00
Martijn van Groningen 2b5e3f5586 Fixed resolving closest nested object when sorting on a field inside nested object 2013-02-25 16:21:22 +01:00
Martijn van Groningen c751df5ee5 Removed unused nested children collector. 2013-02-25 14:13:59 +01:00
Shay Banon c7a05b1dda add helper method to know if ObjectMappers have a nested mapping 2013-02-25 13:40:05 +01:00
Shay Banon 6e3300efd3 better error message on nested sorting 2013-02-25 13:32:00 +01:00
Shay Banon 4bb4e49155 Empty list in ids query should not fail, but match no docs
relates to #2687
2013-02-25 12:51:34 +01:00
Shay Banon bde36647fb Terms/Ids filter: Support empty list of values, resulting in no match for it
closes #2687
also closes #2686
2013-02-25 12:26:49 +01:00
Shay Banon 4145d154bb add a test for empty lookup terms filter 2013-02-25 11:58:58 +01:00
Shay Banon 10ca4d5305 move internal stream facet type lookup to work with bytes 2013-02-25 10:57:18 +01:00
Lukas Vlcek a42f9491b5 fix typo in exception 2013-02-24 07:47:25 +01:00
Shay Banon 595e0e254e [Code refactoring] IndicesStats -> IndicesStatsResponse
fixes #1782
2013-02-23 14:23:36 +01:00
David Pilato 4c493ac71d Revert changes on *Request classes from issue
Relative to #2657
2013-02-23 10:37:56 +01:00
David Pilato a646e126e9 Display list of all available site plugins on /_plugins/ end point fix #2664 2013-02-23 09:34:06 +01:00
Shay Banon eea3a01765 only return 404 on actual index settings missing, on "_all", return 200
relates to #2676
2013-02-22 23:08:38 +01:00
Shay Banon 915019587d Get settings on empty node fails with ArrayIndexOutOfBoundsException[0]
fixes #2676
2013-02-22 23:08:33 +01:00
Igor Motov b8cc8e56c4 Improve stability of SimpleRobinEngineTests 2013-02-22 14:59:49 -05:00
Shay Banon ad70105c39 keep the rescorer builder consistent with other builders, without the use of setters 2013-02-22 14:06:39 +01:00
Shay Banon 03fdc6aa80 Query DSL: Terms filter to allow for terms lookup from another document
closes #2674
2013-02-22 14:04:10 +01:00
Shay Banon 6978aa2189 mark source as "safe" when copying it over 2013-02-22 12:59:41 +01:00
Shay Banon a234e45b59 fix boolean to is from get
relates to #2657
2013-02-22 12:45:56 +01:00
Igor Motov ec3492c67c Improve stability of the testReusePeerRecovery test 2013-02-21 16:06:33 -05:00
Shay Banon b7f5295674 update jsr166y adn jst166e to latest versions 2013-02-21 21:11:14 +01:00
Shay Banon 4753ffdf1e allow to set which queue implementation to use
expert setting, but still would be great to be able to control it
2013-02-21 20:07:40 +01:00
Ilya Nazarov da3d682f0e Check for java-6-openjdk-i386 in init.d
There is check for /usr/lib/jvm/java-6-openjdk-amd64, but no for 32-bit systems (/usr/lib/jvm/java-6-openjdk-i386).
2013-02-21 21:13:51 +07:00
Igor Motov 4ea4de6f8d Add logging information for releasing node lock 2013-02-20 17:53:27 -05:00
Shay Banon 7bb092440a facet refactoring, default collector base post implementation
automatically implement post based on collector
2013-02-20 15:36:11 +01:00
Igor Motov ce6f0e27bf Make file distribution among several disks configurable
Fixes #2650
2013-02-19 21:43:43 -05:00
David Pilato b7afa0f44e Fix test for Support trailing slashes on plugin _site URLs #2654 2013-02-19 21:16:47 +01:00
Martijn van Groningen 3b31c1216e Made the `term_vector` json field the leading way of configuring term vectors. Supported options: `no`, `yes`, `with_offsets`, `with_positions`, `with_positions_offsets` and`with_positions_offsets_payloads`. 2013-02-19 20:55:43 +01:00
Igor Motov 5b9e9a004a Make sure that in SitePluginTests http client connects to the correct node and closes the node after the test 2013-02-19 14:42:24 -05:00
Igor Motov f96c1f1e10 When a node is leaving LocalDiscovery cluster, rerouting should be performed on the master node 2013-02-19 13:14:33 -05:00
Igor Motov d126558dec Add check for health timeout to shardCleanup test 2013-02-19 13:12:26 -05:00
David Pilato 8ab9d2dd1f Support trailing slashes on plugin _site URLs fix #2654 2013-02-19 09:21:45 +01:00
Igor Motov cfaa859bb2 Improve stability of UpdateNumberOfReplicasTests 2013-02-18 20:12:39 -05:00
Igor Motov 4222478b18 Make it simpler to determine which version of state was used to calculate health 2013-02-18 20:02:29 -05:00
Igor Motov 5746c50ef9 Improve stability of shardsCleanup test 2013-02-18 19:35:12 -05:00
Igor Motov 183a74c866 Improve stability of testSimpleAwareness test 2013-02-18 19:31:07 -05:00
Martijn van Groningen 303e87fb69 Added support for sorting by fields inside one or more nested objects.
The sorting by nested field support has the following parameters on top of the already existing sort options:

nested_path - Defines the on what nested object to sort. The actual sort field must be a direct field inside this nested object. The default is to use the most immediate inherited nested object from the sort field.
nested_filter - A filter the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting. Common case is to repeat the query / filter inside the nested filter or query. By default no nested_filter is active.
Either the highest (max) or lowest (min) inner object is picked for during sorting depending on the sort_mode being used. The sort_mode options avg and sum can still be used for number based fields inside nested objects. All the values for the sort field are taken into account for each nested object.

Closes #2662
2013-02-18 22:10:41 +01:00
Simon Willnauer 8db436f107 Remove backported Lucene 4 spatial code in favor of the released version in Lucene 4.1 2013-02-18 18:43:55 +01:00
Jeffrey Gerard 0dfc2169d7 Added Testcase and BugFix fixing #2626 where GeoShape intersects filter omitted matching docs.
SpatialPrefixTree#recursiveGetNodes uses an optimization that prevents
recursion into the deepest tree level if a parent node in the penultimate
level covers all its children.  This produces a bug if the optimization
happens both at indexing and at query/filter time.

This patch fixes the bug by disabling the optimization at indexing time
(to avoid adding overhead for query-heavy workloads).

See LUCENE-4770 for reference
2013-02-18 18:43:47 +01:00
David Pilato cc83c2f848 refactoring getter/setters
Fixes #2657
2013-02-18 11:09:32 -05:00
Martijn van Groningen ac2e6a3a4d Fixed nested facets with filters. 2013-02-18 11:01:18 -05:00
Simon Willnauer 24291d40f4 Expose CJKWidthTokenFilter and CJKBigramTokenFilter
Closes #2660
2013-02-18 11:01:17 -05:00
Shay Banon 547bd7abf2 add our own bloom filter implementation
uses more hash iterations, yet require less memory for the same fpp
relates to #2411
2013-02-18 11:01:17 -05:00
Igor Motov 512585da82 Fix race condition in adding TimeoutClusterStateListener
Fixes #2658
2013-02-18 11:01:17 -05:00
Shay Banon 435eabd4a0 allow to access the global node settings in a static manner 2013-02-18 11:01:17 -05:00
Shay Banon e365ecce10 fix check on which settings to change on 2013-02-18 11:01:17 -05:00
Shay Banon 73a447da86 initial facet refactoring
the main goal of the facet refactoring is to allow for two modes of facet execution, collector based, that get callbacks as hist match, and post based, which iterates over all the relevant hits
it also includes a some simplification of the facet implementation
2013-02-16 02:25:04 +01:00
Shay Banon 06b82a45d4 Simplified range syntax when using a query string
closes #2655
2013-02-15 01:30:55 +01:00
Shay Banon 4714a6acc9 Clear cache: allow to invalidate specific filter cache keys
closes #2653
2013-02-14 21:13:19 +01:00
Shay Banon c12c456192 add note on not using totalSize in merge 2013-02-14 14:30:46 +01:00
Shay Banon e8e3dd1c9d add 0.20.6 ver 2013-02-14 14:29:30 +01:00
Igor Motov 37f16127c5 Fix ScriptFilter cache key calculation
Fixes #2651
2013-02-14 06:13:26 -05:00
Igor Motov 6b49457d9d Optimize conversion to a cacheable DocIdSet 2013-02-13 21:04:54 -05:00
Shay Banon 883c593d7e delay reroute only after we publish that a shard has started 2013-02-14 00:10:52 +01:00
Shay Banon 681239b413 Warmers do not load field data cache for sorting on new segments
fixes #2649
2013-02-13 17:51:34 +01:00
Shay Banon f41eccc7a5 updating non dynamic settings throws an error now 2013-02-13 14:28:16 +01:00
Martijn van Groningen 2193a8e401 Let the update index settings action fail if non dynamic settings are changed for open indices.
Closes #2647
2013-02-13 13:10:56 +01:00
Shay Banon 5ad540a1aa possibly incorrect use of Lucene OneMerge.totalBytesSize
fixes #2643
2013-02-12 22:09:55 +01:00
Martijn van Groningen 3a2d40acd9 Added more trace logging related to finding master. 2013-02-12 21:40:12 +01:00
Shay Banon 5519f80abb add increased timeout waiting for relocation when running on small boxes 2013-02-12 21:23:18 +01:00
Martijn van Groningen fc13499ff5 Added `sort_mode` option that defines what value to pick in the case the sort field is multi-valued.
The `min` and `max` sort modes are supported for all field types. Either the lowest value or the highest value is picked. In addition to that number based fields also support `sum` and `avg` as sort mode. If `sum` sort mode is used then all the values for a field and belonging to a document are added together and the result of that is used as sort value. If the `avg` sort mode is used then the average of all values for the sort field belonging to that document is used as sort value.

Relates to #2634
2013-02-12 20:38:24 +01:00
Shay Banon 7d13545e33 delete indices before running the tests 2013-02-12 19:28:48 +01:00
Shay Banon 668bcd0eb7 Bulk execution while a shard is replication might send erroneous version conflict failures for certain items
fixes #2642
2013-02-12 17:38:06 +01:00
Simon Willnauer a7bbab7e87 # Rescore Feature
The rescore feature allows te rescore a document returned by a query based
on a secondary algorithm. Rescoring is commonly used if a scoring algorithm
is too costly to be executed across the entire document set but efficient enough
to be executed on the Top-K documents scored by a faster retrieval method. Rescoring
can help to improve precision by reordering a larger Top-K window than actually
returned to the user. Typically is it executed on a window between 100 and 500 documents
while the actual result window requested by the user remains the same.

# Query Rescorer

The `query` rescorer executes a secondary query only on the Top-K results of the actual
user query and rescores the documents based on a linear combination of the user query's score
and the score of the `rescore_query`. This allows to execute any exposed query as a
`rescore_query` and supports a `query_weight` as well as a `rescore_query_weight` to weight the
factors of the linear combination.

# Rescore API

The `rescore` request is defined along side the query part in the json request:

```json
curl -s -XPOST 'localhost:9200/_search' -d {
  "query" : {
    "match" : {
      "field1" : {
        "query" : "the quick brown",
        "type" : "boolean",
        "operator" : "OR"
      }
    }
  },
  "rescore" : {
    "window_size" : 50,
    "query" : {
      "rescore_query" : {
        "match" : {
          "field1" : {
            "query" : "the quick brown",
            "type" : "phrase",
            "slop" : 2
          }
        }
      },
      "query_weight" : 0.7,
      "rescore_query_weight" : 1.2
    }
  }
}
```

Each `rescore` request is executed on a per-shard basis within the same roundtrip. Currently the rescore API
has only one implementation (the `query` rescorer) which modifies the result set in-place. Future developments
could include dedicated rescore results if needed by the implemenation ie. a pair-wise reranker.
*Note:* Only regualr queries are rescored, if the search type is set to `scan` or `count` rescorers are not executed.

Closes #2640
2013-02-12 17:10:00 +01:00
Shay Banon c65aff7775 Index with no replicas might loose on going documents while relocating a shard
fixes #26421
2013-02-12 17:03:28 +01:00
Martijn van Groningen e54f010a4d Also support camel case notation for minimal norwegian. 2013-02-12 16:39:11 +01:00
morsegel ca7920a398 added norwegian minimal stemmer 2013-02-12 16:32:38 +01:00
Igor Motov f98bd654a8 Fix filter cache stats calculation
Fixes #2609
2013-02-11 10:28:53 -05:00
uboness a2b87e28f6 fixed a bug in PrioritizedThreadPoolExecutor:
now execute(Runnable) satisfies the priority and fifo nature of same-priority runnables
2013-02-09 04:20:16 +01:00
uboness eef3610e12 fixed a bug in PrioritizedThreadPoolExecutor:
now execute(Runnable) verifies the command is added as Comparable
2013-02-09 03:33:12 +01:00
uboness 678a8664f6 fixed a bug in PrioritizedThreadPoolExecutor:
now execute(Runnable) verifies the command is added as PrioritizedRunnable
2013-02-09 03:26:52 +01:00
uboness 6d9048f8cc added priority support for cluster state updates:
* URGENT:
    * cluster_reroute (api)
    * refresh-mapping
    * cluster_update_settings
    * reroute_after_cluster_update_settings
    * create-index
    * delete-index
    * index-aliases
    * remove-index-template
    * create-index-template
    * update-mapping
    * remove-mapping
    * put-mapping
    * open-index
    * close-index
    * update-settings

* HIGH
    * routing-table-updater
    * zen-disco-node_left
    * zen-disco-master_failed
    * shard-failed
    * shard-started

* NORMAL
    * all other actions
2013-02-09 01:14:57 +01:00
Simon Willnauer f5331c9535 Cleanup NumericFieldData. FieldData interfaces are reduced to long and double while internal
represenations still operate on the actual datatypes.
2013-02-08 20:58:36 +01:00
Martijn van Groningen 1189a2c2c2 Extended mv sorting integration test 2013-02-08 15:24:56 +01:00
Martijn van Groningen 8c7779057c Added sort by field that have multiple values per document.
Closes #2634
2013-02-08 13:28:40 +01:00
Simon Willnauer 033d6e4306 don't use substraction for comparison if datatypes can overflow 2013-02-08 10:07:31 +01:00
Martijn van Groningen f97021b165 Fixes size assertion failure. 2013-02-07 16:50:54 +01:00
Martijn van Groningen e2cb7edb08 Added more info to assert 2013-02-07 13:52:25 +01:00
Martijn van Groningen e72e323c8a Attempt to fix "No active shards" failure 2013-02-07 10:14:10 +01:00
Lee Hinman ed43ad07d7 Throw a more meaningful message when no document is specified for indexing 2013-02-06 22:33:02 +01:00
Florian Schilling a52e01f3e5 Remove XTermsFilter and UidFilter in favour of Lucene 4.1 TermsFilter 2013-02-06 18:45:05 +01:00
Igor Motov 6890c9fa62 Move action.wait_on_mapping_change setting to pom 2013-02-06 11:48:58 -05:00
Igor Motov ed09ba0a18 Improve stability of RecoveryPercolatorTests
Without "action.wait_on_mapping_change" setting set to true, the test node might get shutdown before updated mapping is saved.
2013-02-05 14:53:46 -05:00
Igor Motov 8277833f8d Fix settings processing in WordDelimiterTokenFilterFactory 2013-02-05 10:03:00 -05:00
Martijn van Groningen 19295280d9 Made sure that wrapped child query / parent query gets rewritten only once. 2013-02-05 10:27:31 +01:00
Igor Motov 9e89323ad2 Add proper cleanup to InternalSettingsPerparerTests 2013-02-04 19:58:40 -05:00
Martijn van Groningen bc667c378e Made SoftWrapper fields final. 2013-02-04 14:47:36 +01:00
Martijn van Groningen 8109d13733 Use CacheRecycler when resolving parent docs in TopChildrenQuery. 2013-02-04 12:46:30 +01:00
Martijn van Groningen 9c3a86875b Removed `execution_type` for has_child and has_parent. 2013-02-04 11:37:40 +01:00
Igor Motov 20ce01bd53 Add additional query validation to the terms query parser
Fixes #2608
2013-02-03 09:44:16 -05:00
Shay Banon ebc0c8cc6d when we fix maxMergeAtOnce, make sure to not set it to 1 as its an illegal value 2013-02-01 19:00:01 +01:00
Shay Banon a8c9e580ed add getMaxOrd, and properly document the difference between it and numOrds 2013-02-01 16:13:13 +01:00
Shay Banon 6f1932ab67 support yaml detection on char sequence 2013-02-01 12:46:19 +01:00
Simon Willnauer 6468c15446 check for == 0 rather than > 0 2013-02-01 11:11:47 +01:00
Simon Willnauer c18ae4a194 fix getMemorySizeInBytes in SparseMultiArrayOrdinals 2013-02-01 11:09:09 +01:00
Igor Motov 45b2bff8da Improve SearchStatsTests
Added refresh to guarantee that at least something will be fetched on a fast computer.
2013-01-31 21:19:08 -05:00
Igor Motov ca635deb36 Allow health to be executed on a local node instead of the master 2013-01-31 21:19:08 -05:00
Igor Motov 3c9541dd14 Make facet and sort tests more reliable in case of multiple nodes and shards
Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.
2013-01-31 21:19:07 -05:00
Igor Motov 6a01e7882c Improve shardsCleanup test
When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.
2013-01-31 21:18:14 -05:00
Igor Motov e32efba3d8 Improve RecoverAfterNodes tests 2013-01-31 20:05:55 -05:00
Martijn van Groningen 5e811e5382 Another small TopChildrenQuery cleanup. 2013-01-31 23:49:32 +01:00
Martijn van Groningen 7ef65688cd - TopChildrenQuery cleanup.
- Added class level jdocs for TopChildrenQuery and ChildrenQuery.
2013-01-31 23:38:09 +01:00
Simon Willnauer 1a1df06411 Move OrdsBuilding into a dedicated class and abstract integer pools used to build sparse ordinals 2013-01-31 19:02:31 +01:00
Martijn van Groningen 1f50b07406 Initial parent/child queries cleanup. 2013-01-31 18:39:31 +01:00
Martijn van Groningen 371b071fb7 Added notion of Rewrite that replaces ScopePhase 2013-01-31 17:24:46 +01:00
Martijn van Groningen d4ef4697d5 Also remove scope from facet builders. Fixes build. 2013-01-31 16:34:45 +01:00
Martijn van Groningen 46dd42920c Remove scope support in query and facet dsl.
Remove support for the `scope` field in facets and `_scope` field in the nested and parent/child queries. The scope support for nested queries will be replaced by the `nested` facet option and a facet filter with a nested filter. The nested filters will now support the a `join` option. Which controls whether to perform the block join. By default this enabled, but when disabled it returns the nested documents as hits instead of the joined root document.

Search request with the current scope support.
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
    "query" : {
		"nested" : {
			"path" : "offers",
			"query" : {
				"match" : {
					"offers.color" : "blue"
				}
			},
			"_scope" : "my_scope"
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "offers.size"
			},
			"scope" : "my_scope"
		}
	}
}'
```

The following will be functional equivalent of using the scope support:
```
curl -s -XPOST 'localhost:9200/products/_search?search_type=count' -d '{
    "query" : {
		"nested" : {
			"path" : "offers",
			"query" : {
				"match" : {
					"offers.color" : "blue"
				}
			}
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "offers.size"
			},
			"facet_filter" : {
				"nested" : {
					"path" : "offers",
					"query" : {
						"match" : {
							"offers.color" : "blue"
						}
					},
					"join" : false
				}
			},
			"nested" : "offers"
		}
	}
}'
```

The scope support for parent/child queries will be replaced by running the child query as filter in a global facet.

Search request with the current scope support:
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
	"query" : {
		"has_child" : {
			"type" : "offer",
			"query" : {
				"match" : {
					"color" : "blue"
				}
			},
			"_scope" : "my_scope"
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "size"
			},
			"scope" : "my_scope"
		}
	}
}'
```

The following is the functional equivalent of using the scope support with parent/child queries:
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
	"query" : {
		"has_child" : {
			"type" : "offer",
			"query" : {
				"match" : {
					"color" : "blue"
				}
			}
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "size"
			},
			"global" : true,
			"facet_filter" : {
				"term" : {
					"color" : "blue"
				}
			}
		}
	}
}'
```

Closes #2606
2013-01-31 15:09:57 +01:00
Martijn van Groningen 355381962b Use only the 'test' index, instead of all indices for child search benchmark. 2013-01-31 13:12:33 +01:00
Shay Banon 6cec73c201 remove fuzzy factor from mapping (internally implemented)
we want to support ~ notion in query parser for types other than strings, we are getting there, one can do now age:10~5, we would love to support it for dates, as in timestamp:2012-10-10~5d, but that requires changes in the query parser to support strings after the ~ sign
2013-01-31 12:23:03 +01:00
Igor Motov 8df7f2af0d Improve testReusePeerRecovery test 2013-01-30 19:51:41 -05:00
Igor Motov 29f4274213 Add index cleanup if index creation fails
Fixes #2590
2013-01-30 10:40:01 -05:00
Shay Banon 5c40c97e6e Id Cache: Allow to configure if ids should be reused (memory wise) or not, default to false
closes #2605
2013-01-30 14:42:07 +01:00
Martijn van Groningen bc20f068c9 Made `search_analyzer` updateable via put mapping api.
Closes #2604
2013-01-30 11:49:20 +01:00
Martijn van Groningen e074e00f76 Fielddata: Moved the growing logic to IntArrayRef 2013-01-30 11:20:41 +01:00
Martijn van Groningen f7692aeef2 Fielddata: IntArrayRef is initialized with small array and grows if needed 2013-01-30 10:57:52 +01:00
Simon Willnauer 5df37eaf75 add more advanced tests for phrase_prefix 2013-01-30 10:51:05 +01:00
Shay Banon f5e55b7cb9 properly print JVM version 2013-01-29 20:25:13 +01:00
Shay Banon 0568284147 reduce the memory needed while building the sparse array ordinals 2013-01-29 20:23:54 +01:00
Shay Banon 716f2aebbb add 0.20.5 2013-01-29 10:14:25 +01:00
Simon Willnauer 0697e2f23e use index prefix in tests to prevent misconfiguration 2013-01-28 15:51:06 +01:00
Simon Willnauer 72a2416a8c Support MultiPhrasePrefixQuery and MultiPhraseQuery in highlighters
Closes #2596
2013-01-28 15:41:25 +01:00
Martijn van Groningen 2e68207d6d Updated suggest api.
# Suggest feature
The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available from version `0.21.0`.

# Fuzzy suggester
The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request.

# Suggest API
The suggest request part is defined along side the query part as top field in the json request.

```
curl -s -XPOST 'localhost:9200/_search' -d '{
  "query" : {
    ...
  },
  "suggest" : {
    ...
  }
}'
```

Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. Both `my-suggest-1` and `my-suggest-2` suggestions use the `fuzzy` suggester, but have a different `text`.

```
"suggest" : {
  "my-suggest-1" : {
    "text" : "the amsterdma meetpu",
    "fuzzy" : {
      "field" : "body"
    }
  },
  "my-suggest-2" : {
    "text" : "the rottredam meetpu",
    "fuzzy" : {
      "field" : "title",
    }
  }
}
```

The below suggest response example includes the suggestion response for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains entries. Each entry is effectively a token from the suggest text and contains the suggestion entry text, the original start offset and length in the suggest text and if found an arbitary number of options.

```
{
  ...
  "suggest": {
    "my-suggest-1": [
      {
        "text" : "amsterdma",
        "offset": 4,
        "length": 9,
        "options": [
           ...
        ]
      },
      ...
    ],
    "my-suggest-2" : [
      ...
    ]
  }
  ...
}
```

Each options array contains a option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance.

```
"options": [
  {
    "text": "amsterdam",
    "freq": 77,
    "score": 0.8888889
  },
  ...
]
```

# Global suggest text

To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is defined globally and applies to the `my-suggest-1` and `my-suggest-2` suggestions.

```
"suggest" : {
  "text" : "the amsterdma meetpu"
  "my-suggest-1" : {
    "fuzzy" : {
      "field" : "title"
    }
  },
  "my-suggest-2" : {
    "fuzzy" : {
      "field" : "body"
    }
  }
}
```

The suggest text can in the above example also be specied as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.

# Other suggest example.

In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase.

```
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
    "my-title-suggestions-1" : {
      "text" : "devloping distibutd saerch engies",
      "fuzzy" : {
        "size" : 3,
        "field" : "title"
      }
    }
  }
}'
```

The above request could yield the response as stated in the code example below. As you can see if we take the first suggested options of each suggestion entry we get `developing distributed search engines` as result.

```
{
  ...
  "suggest": {
    "my-title-suggestions-1": [
      {
        "text": "devloping",
        "offset": 0,
        "length": 9,
        "options": [
          {
            "text": "developing",
            "freq": 77,
            "score": 0.8888889
          },
          {
            "text": "deloping",
            "freq": 1,
            "score": 0.875
          },
          {
            "text": "deploying",
            "freq": 2,
            "score": 0.7777778
          }
        ]
      },
      {
        "text": "distibutd",
        "offset": 10,
        "length": 9,
        "options": [
          {
            "text": "distributed",
            "freq": 217,
            "score": 0.7777778
          },
          {
            "text": "disributed",
            "freq": 1,
            "score": 0.7777778
          },
          {
            "text": "distribute",
            "freq": 1,
            "score": 0.7777778
          }
        ]
      },
      {
        "text": "saerch",
        "offset": 20,
        "length": 6,
        "options": [
          {
            "text": "search",
            "freq": 1038,
            "score": 0.8333333
          },
          {
            "text": "smerch",
            "freq": 3,
            "score": 0.8333333
          },
          {
            "text": "serch",
            "freq": 2,
            "score": 0.8
          }
        ]
      },
      {
        "text": "engies",
        "offset": 27,
        "length": 6,
        "options": [
          {
            "text": "engines",
            "freq": 568,
            "score": 0.8333333
          },
          {
            "text": "engles",
            "freq": 3,
            "score": 0.8333333
          },
          {
            "text": "eggies",
            "freq": 1,
            "score": 0.8333333
          }
        ]
      }
    ]
  }
  ...
}
```

# Common suggest options:
* `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.

# Common fuzzy suggest options
* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value:
** `score` - Sort by sore first, then document frequency and then the term itself.
** `frequency` - Sort by document frequency first, then simlarity score and then the term itself.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
** `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
** `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
** `always` - Suggest any matching suggestions based on terms in the suggest text.

# Other fuzzy suggest options:
* `lowercase_terms` - Lower cases the suggest text terms after text analyzation.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.
2013-01-28 15:18:18 +01:00
Simon Willnauer 48488f707f Expose CommonTermsQuery in Match & MultiMatch and enable highlighting
Closes #2591
2013-01-28 11:57:05 +01:00
Shay Banon bfdf8fe590 Indexes created from index request might not replica initial doc to replica
fixes #2594
2013-01-28 11:29:32 +01:00
Shay Banon 9539661d40 move facet reduce from facet process to the actual facet
this will simplify execution, and actually let the process just be a parser (rename will probably happen)
2013-01-27 13:45:38 +01:00
Shay Banon 360d7d9425 default for paged_bytes for string type
less memory overhead, though a bit slower on the execution side for facets, and might require more memory per facet execution
2013-01-26 15:11:14 +01:00
Simon Willnauer 5c89d66216 move ShardsAllocatorModuleTests to o.e.t.integration 2013-01-25 22:26:30 +01:00
Shay Banon 41cfe9cc27 add 0.20.4 2013-01-25 22:02:34 +01:00
Shay Banon 042a5d02d9 Primary shard failure with initializing replica shards can cause the replica shard to cause allocation failures
fixes #2592
2013-01-25 17:59:01 +01:00
Simon Willnauer a7bb3c29f2 Propagate exception during recovery if segement info can not be opended but should 2013-01-25 15:25:48 +01:00
Shay Banon 1be84c273b eagerly reroute when a node leaves the cluster 2013-01-25 15:23:05 +01:00
Martijn van Groningen a1ef1f02cc Exposed IndexOptions#DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS setting. 2013-01-25 00:02:43 +01:00
Shay Banon 45ed9ddba7 cleanup ordinals in field data 2013-01-24 22:31:52 +01:00
Shay Banon 990acff4f7 make sure we wait for yellow stats in suggest API when searching on clean index 2013-01-24 22:31:51 +01:00
Martijn van Groningen f974a17229 Removed AbstractFragmentsBuilder. Lucene's BaseFragmentsBuilder has now discrete multivalued highlighting and better support for requesting large number of fragments. 2013-01-24 22:15:07 +01:00
Martijn van Groningen e56b279624 Made BlockJoinScorer#freq() method handle freqs correctly (as is done in ToParentBlockJoinQuery) 2013-01-24 21:52:56 +01:00
Martijn van Groningen 9013eeae8a Added filter support in the `has_child` and `has_parent` filters.
Example:
```
curl -XPOST 'localhost:9200/_search' -d '{
  "query": {
    "filtered_query": {
      "query": {
        "match": {
          "title": "distributed systems"
        }
      },
      "filter": {
        "has_child": {
          "type": "tag",
          "filter": {
            "term": {
              "name": "book"
            }
          }
        }
      }
    }
  }
}'
```

Closes #2585
2013-01-24 21:32:38 +01:00
Shay Banon a39469a252 gather the field data that are changed
(we will make use of that later)
2013-01-24 15:55:23 +01:00
Martijn van Groningen 98a674fc6e Added suggest api.
# Suggest feature
The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available since version `0.21.0`.

# Fuzzy suggester
The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request.

# Suggest API
The suggest request part is defined along side the query part as top field in the json request.

```
curl -s -XPOST 'localhost:9200/_search' -d '{
    "query" : {
        ...
    },
    "suggest" : {
        ...
    }
}'
```

Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. The `my-suggest-1` suggestion uses the `body` field and `my-suggest-2` uses the `title` field. The `type` field is a required field and defines what suggester to use for a suggestion.

```
"suggest" : {
    "suggestions" : {
        "my-suggest-1" : {
            "type" : "fuzzy",
            "field" : "body",
            "text" : "the amsterdma meetpu"
        },
        "my-suggest-2" : {
            "type" : "fuzzy",
            "field" : "title",
            "text" : "the rottredam meetpu"
        }
    }
}
```

The below suggest response example includes the suggestions part for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains a terms array, that contains all terms outputted by the analyzed suggest text. Each term object includes the term itself, the original start and end offset in the suggest text and if found an arbitary number of suggestions.

```
{
    ...
    "suggest": {
        "my-suggest-1": {
            "terms" : [
              {
                "term" : "amsterdma",
                "start_offset": 5,
                "end_offset": 14,
                "suggestions": [
                   ...
                ]
              }
              ...
            ]
        },
        "my-suggest-2" : {
          "terms" : [
            ...
          ]
        }
    }
```

Each suggestions array contains a suggestion object that includes the suggested term, its document frequency and score compared to the suggest text term. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance.

```
"suggestions": [
    {
        "term": "amsterdam",
        "frequency": 77,
        "score": 0.8888889
    },
    ...
]
```

# Global suggest text

To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is a global option and applies to the `my-suggest-1` and `my-suggest-2` suggestions.

```
"suggest" : {
    "suggestions" : {
        "text" : "the amsterdma meetpu",
        "my-suggest-1" : {
            "type" : "fuzzy",
            "field" : "title"
        },
        "my-suggest-2" : {
            "type" : "fuzzy",
            "field" : "body"
        }
    }
}
```

The suggest text can be specied as global option or as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.

# Other suggest example.

In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase.

```
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
      "suggestions" : {
        "my-title-suggestions" : {
          "suggester" : "fuzzy",
          "field" : "title",
          "text" : "devloping distibutd saerch engies",
          "size" : 3
        }
      }
  }
}'
```

The above request could yield the response as stated in the code example below. As you can see if we take the first suggested term of each suggest text term we get `developing distributed search engines` as result.

```
{
  ...
  "suggest": {
    "my-title-suggestions": {
      "terms": [
        {
          "term": "devloping",
          "start_offset": 0,
          "end_offset": 9,
          "suggestions": [
            {
              "term": "developing",
              "frequency": 77,
              "score": 0.8888889
            },
            {
              "term": "deloping",
              "frequency": 1,
              "score": 0.875
            },
            {
              "term": "deploying",
              "frequency": 2,
              "score": 0.7777778
            }
          ]
        },
        {
          "term": "distibutd",
          "start_offset": 10,
          "end_offset": 19,
          "suggestions": [
            {
              "term": "distributed",
              "frequency": 217,
              "score": 0.7777778
            },
            {
              "term": "disributed",
              "frequency": 1,
              "score": 0.7777778
            },
            {
              "term": "distribute",
              "frequency": 1,
              "score": 0.7777778
            }
          ]
        },
        {
          "term": "saerch",
          "start_offset": 20,
          "end_offset": 26,
          "suggestions": [
            {
              "term": "search",
              "frequency": 1038,
              "score": 0.8333333
            },
            {
              "term": "smerch",
              "frequency": 3,
              "score": 0.8333333
            },
            {
              "term": "serch",
              "frequency": 2,
              "score": 0.8
            }
          ]
        },
        {
          "term": "engies",
          "start_offset": 27,
          "end_offset": 33,
          "suggestions": [
            {
              "term": "engines",
              "frequency": 568,
              "score": 0.8333333
            },
            {
              "term": "engles",
              "frequency": 3,
              "score": 0.8333333
            },
            {
              "term": "eggies",
              "frequency": 1,
              "score": 0.8333333
            }
          ]
        }
      ]
    }
  }
  ...
}
```

# Common suggest options:
* `suggester` - The suggester implementation type. The only supported value is 'fuzzy'. This is a required option.
* `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.

# Common fuzzy suggest options
* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value:
** `score` - Sort by sore first, then document frequency and then the term itself.
** `frequency` - Sort by document frequency first, then simlarity score and then the term itself.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
** `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
** `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
** `always` - Suggest any matching suggestions based on terms in the suggest text.

# Other fuzzy suggest options:
* `lowercase_terms` - Lower cases the suggest text terms after text analyzation.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.

 Closes #2585
2013-01-24 15:41:06 +01:00
Shay Banon 9673a1c366 expose field data settings in mapping, they can be updated using merge mapping 2013-01-24 15:33:24 +01:00
Simon Willnauer 4eefcb9c82 Expose CommonTermsQuery
Closes #2583
2013-01-24 14:18:01 +01:00
Simon Willnauer c4eab90b2e Cleanup MatchQuery 2013-01-24 14:11:56 +01:00
Shay Banon c2f35621f6 allow to get settings as delimited string 2013-01-24 12:03:16 +01:00
Shay Banon b143822bac allow to load settings from delimited string 2013-01-24 12:00:14 +01:00
Simon Willnauer 88f68264c7 Reuse MemoryIndex instances across Percolator requests.
* added configurable MemoryIndexPool that pools MemoryIndex instance across Threads
* Pool can be configured based on the number of pooled instances as well as the maximum number of bytes that is reused across the pooled instances

Closes #2581
2013-01-24 11:53:21 +01:00
Shay Banon e8c1180ede add field data stats 2013-01-24 11:38:18 +01:00
Shay Banon 613b746299 move field data type to simply be type and settings 2013-01-24 09:33:16 +01:00
Martijn van Groningen 50ac477d92 Fixed small bug. Index name should be used to lookup entry. 2013-01-23 23:53:20 +01:00
Shay Banon 4967a97faf don't use private since its accessed from inner class, remove $$ need 2013-01-23 22:17:27 +01:00
Martijn van Groningen 346422b747 Added sparse multi ordinals implementation for field data. 2013-01-23 22:11:31 +01:00
Daniel Muller 9e79f54cb1 Check for java-6-openjdk-amd64 2013-01-23 18:34:37 +01:00
synhershko e0f711a94a Updating Lucene version 2013-01-23 16:18:18 +02:00
Shay Banon a74e7f8099 refactor geo to extract common classes 2013-01-23 14:14:21 +01:00
Simon Willnauer 9c729fad2c remove flush check IW#commit always adds a commit point now even if nothing has changed ie. docs are added, updated or deleted. 2013-01-23 14:06:01 +01:00
Shay Banon 22f0e79a84 use merge trigger to control when to do merges
now with merge trigger, we can simply decide when to do merges based on it
2013-01-23 13:24:20 +01:00
Shay Banon d969e61999 Remove settings option for index store compression, compression is always enabled
closes #2577
2013-01-23 13:11:48 +01:00
Simon Willnauer 2880cd0172 Upgrade to Lucene 4.1
* Removed CustmoMemoryIndex in favor of MemoryIndex which as of 4.1 supports adding the same field twice
* Replaced duplicated logic in X[*]FSDirectory for rate limiting with a RateLimitedFSDirectory wrapper
* Remove hacks to find out merge context in rate limiting in favor of IOContext
* replaced Scorer#freq() return type (from float to int)
* Upgraded FVHighlighter to new 'centered' highlighting
* Fixed RobinEngine to use seperate setCommitData
2013-01-23 11:54:11 +01:00
Shay Banon 20f43bf54c add hasSingleArrayBackingStorage
allow for optimization only when there really is a single array, and not when there is a multi dimensional one
2013-01-23 10:24:43 +01:00
Igor Motov bbfd3957eb Improve stability of the testNodesInfos test 2013-01-22 12:29:38 -05:00
Igor Motov 9becdb814a Improve stability of the shardsCleanup test 2013-01-22 10:20:18 -05:00
Shay Banon c295211a85 final move to new field data 2013-01-22 16:16:33 +01:00
Shay Banon 27bfb341ff better logging on missing format, and allow to configure format on a type on the index level 2013-01-22 16:16:33 +01:00
uboness 09cc70b8c9 added predefined empty implementation for all atomic field datas 2013-01-22 16:16:33 +01:00
Shay Banon 6b92b592b4 allow to clear by reader the new field data cache 2013-01-22 16:16:32 +01:00
Shay Banon c67386f644 properly invalidate on core closed reader 2013-01-22 16:16:32 +01:00
Shay Banon af757fd821 more usage of field data
note, removed field data from cache stats, it will have its own stats later on (cache part is really misleading)
2013-01-22 16:16:32 +01:00
Shay Banon de013babf8 move geo filters and numeric range to use new field data 2013-01-22 16:16:32 +01:00
Shay Banon be1e5becbb move scripts to use new field data 2013-01-22 16:16:32 +01:00
Shay Banon 772ee9db54 move terms to use new field data 2013-01-22 16:16:32 +01:00
Shay Banon e5b651321f remove some safe methods because of the new makeSafe method usage 2013-01-22 16:16:32 +01:00
Shay Banon f189a832c5 grr pages -> paged 2013-01-22 16:16:32 +01:00
Shay Banon 5b7173fc35 move sorting to work with new field data 2013-01-22 16:16:32 +01:00
uboness b739bf97d4 added missing dedicated value comparators for the different indices field data 2013-01-22 16:16:32 +01:00
Shay Banon 45f27fe96a add packed bytes variant for strings/bytes 2013-01-22 16:16:32 +01:00
uboness 855b64a8a7 byte field data implementation 2013-01-22 16:16:31 +01:00
uboness f1f3c241fd short field data implementation 2013-01-22 16:16:31 +01:00
uboness 3840439365 float field data implementation 2013-01-22 16:16:31 +01:00
Shay Banon 9137fcc6fc move geo distance sorting to use new field data 2013-01-22 16:16:31 +01:00
Shay Banon d5e70a27df integer type to support int field data type 2013-01-22 16:16:31 +01:00
uboness fc09ce7ac9 Implemented int field data 2013-01-22 16:16:31 +01:00
Shay Banon d82859c82b geo point new field mapper with geo distance facet based impl 2013-01-22 16:16:31 +01:00
Shay Banon 2e86081f7b use smartNameMapper on context 2013-01-22 16:16:31 +01:00
Shay Banon d88e3f73ac add specific makeSafe method to make an unsafe (shared) bytes based value to a "safe" one 2013-01-22 16:16:31 +01:00
Shay Banon 1765b0b813 date histogram to use new field data 2013-01-22 16:16:31 +01:00
Shay Banon 37acba1b57 terms stats to use new field data 2013-01-22 16:16:31 +01:00
Shay Banon f1f86efed5 move statistical facet to use new field data 2013-01-22 16:16:30 +01:00
Shay Banon 699ff2782e move histogram facet to use new field data 2013-01-22 16:16:30 +01:00
Shay Banon 8c7e0f5ca1 fix getOrds on single array ords 2013-01-22 16:16:30 +01:00
Shay Banon fa363b2dca move range facet to use new field data abstraction 2013-01-22 16:16:30 +01:00
Shay Banon 692413862a add clear when deleting an index for the field data service 2013-01-22 16:16:30 +01:00
Shay Banon a39ca58de9 add field data service to index level services 2013-01-22 16:16:30 +01:00
Shay Banon 2d91939253 add initial field data type support to mappers
hardwired and still happily leaves with current field data impl
2013-01-22 16:16:30 +01:00
Shay Banon e0b280f9b3 use FieldMapper.Names for fieldNames, and not just fieldName as string 2013-01-22 16:16:30 +01:00
Shay Banon 7dc5cf9799 add long field support 2013-01-22 16:16:30 +01:00
Shay Banon 7397007e05 initial commit 2013-01-22 16:16:30 +01:00
Clinton Gormley 7cfdd9ef59 Corrected filter strategy option in FilteredQueryParser
Changed from 'query_filter' to 'query_first'
2013-01-22 12:54:00 +01:00
Simon Willnauer 0b730aae81 Pass on filterStrategy in XFilteredQuery if query is rewritten 2013-01-22 12:40:21 +01:00
Martijn van Groningen a5bd57ed6c Added trace log statement, to catch stacktraces 2013-01-20 23:17:18 +01:00
Simon Willnauer 35cf9ee11d wait for cluster to be formed in SimpleNodesInfoTests 2013-01-19 15:44:26 +01:00
Simon Willnauer d6b613ac8c Respect lowercase_expanded_terms in MappingQueryParser
Fixes #2566
2013-01-19 13:57:45 +01:00
Simon Willnauer 31fd521fd1 provide more information if a null DocumentMapper is returned 2013-01-18 16:43:56 +01:00
Simon Willnauer c563248f76 testMoreLikeThisIssue2197 should create index mapping first to prevent races 2013-01-18 16:41:37 +01:00
Simon Willnauer 6f38a3a8a8 create index and mapping first to ensure all relevant nodes see the mapping 2013-01-18 16:09:24 +01:00
Simon Willnauer 393de984bd Remove deprecated StreamInput/Output#read/writeUTF 2013-01-17 22:38:42 +01:00
Simon Willnauer d37c844da0 use camelcase for getters 2013-01-17 22:27:44 +01:00
Simon Willnauer 3d80c53192 Allow ShardsAllocator to be configured via node level settings.
* Default ShardsAllocator is set to BalancedShardsAllocator
* Core ShardsAllocator implementations can be defined via 'cluster.routing.allocation.type'
* Core ShardsAllocator implementations are exposed via short keys 'balanced' (BalancedShardsAllocator) and 'even_shards' (EvenShardsCountAllocator)
* Third party allocators can be loaded via fully-qualified class names.

Closes #2557
2013-01-17 16:23:52 +01:00
Simon Willnauer 2eb09e6b1a Added BalancedShardsAllocator that balances shards based on a weight function.
* Weights are calculated per index and incorporate index level, global and primary related parameters
 * Balance operations are executed based on a win maximation strategy that tries to relocate shards
   first that offer the biggest gain towards the weight functions optimum
 * The WeightFunction allows settings to prefer index based balance over global balance and vice versa
 * Balance operations can be throttled by raising a threshold resulting in less agressive balance operations
 * WeightFunction shipps with defaults to achive evenly distributed indexes while maintaining a global balance

Closes #2555
2013-01-17 12:02:42 +01:00
Igor Motov d97839b8a8 Fix char filter issues introduced during lucene 4 migration
Fixes #2543
2013-01-14 12:43:02 -05:00
Igor Motov e82f96f1e5 Make script cache configurable and bounded
Fixes #2539
2013-01-14 06:57:13 -05:00
Igor Motov 6243f8e64d Disallow unknown custom indexing parameters
Fixes #2354
2013-01-11 10:14:25 -05:00
Martijn van Groningen 1ce10dfb06 Fixed issue where parent & child queries can fail if a segment doesn't have documents with the targeted type or associated parent type
Closes #2537
2013-01-11 16:06:14 +01:00
Martijn van Groningen 43aabe88e8 Fixed document already exists error when concurrently sending update request with upsert using the same id.
Closes #2530
2013-01-10 14:25:44 +01:00
Shay Banon 6f7253c524 Comments are not allowed in mapping
checked jackson, there won't be an overhead in enabling comments. Added, with the caveat that when used with mappings, and calling "get mapping", the comments will not be returned
closes #1394
2013-01-07 04:21:41 +01:00
Shay Banon 2c4b9d9ba2 cleanup queryHint since its not was never used
preference ended up as the way to control routing
2013-01-07 04:02:45 +01:00
Shay Banon bcdda811ef add read/write optional text 2013-01-07 02:54:22 +01:00