Commit Graph

1782 Commits

Author SHA1 Message Date
Martijn van Groningen c21ab1a9cf Return proper response code for delete by query api in the case of failures.
Closes #2963
2013-05-01 11:53:40 +02:00
Igor Motov 6437c51501 Improve stability of SimpleRecoveryLocalGatewayTests
Fixed testX and testSingleNodeNoFlush by specifying mapping on index creation instead of using dynamic mapping. Dynamic mapping is updated on the cluster level asynchronously and if mapping changes are not applied to the cluster state before node is closed, these changes are not be available after node restart. While data added in the test is preserved, due to absence of mapping, the test still fails. This is a known issue that we are not planning to fix at the moment.
2013-04-30 12:11:30 -04:00
Alexander Reelsen a694e97ab9 Support source include/exclude for realtime GET
Currently realtime GET does not take source includes/excludes into account.
This patch adds support for the source field mapper includes/excludes
when getting an entry from the transaction log. Even though it introduces
a slight performance penalty, it now adheres to the defined configuration
instead of returning all source data when a realtime get is done.
2013-04-30 17:48:03 +02:00
Alexander Reelsen d5f4c8230d XContentMapValues.filter now works with nested arrays
The filter method of XContentMapValues actually filtered out nested
arrays/lists completely due to a bug in the filter method, which threw
away all data inside of such an array.

Closes #2944
This bug was a follow up problem, because of the filtering of nested arrays
in case source exclusion was configured.
2013-04-30 17:33:09 +02:00
Simon Willnauer 773ea0306b Fail will IAE if a numeric field is used for the anaysis endpoint.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.

Closes #2953 #2952
2013-04-30 16:07:11 +02:00
Simon Willnauer 8c6ba59b83 Upgrade Lucene Version to 4.2. The latest Elasticsearch version must
use the latest Lucene version as specified in o.e.common.lucene.Lucene
and must be upgraded with each lucene release.

This commit adds an assert that fails once the actual lucene version
that is used is higher than the current releases version.
2013-04-30 14:06:57 +02:00
Simon Willnauer 42b9674d0c added simple test for numeric match query 2013-04-30 13:53:49 +02:00
Shay Banon 6c3bb4dcdd move to 1.0.0.Beta1 snap 2013-04-29 13:51:09 +02:00
Shay Banon cb75ce0caa release 0.90.0 GA 2013-04-29 13:41:43 +02:00
Shay Banon 9ded2405a0 Use Lucene Version that was used to create the index in Analysis
Lucene ships with a version constant that is mainly used to provide consistent behaviour across lucene release versions. Lucene's Analysis capabilities are commonly applied at index and search time such that the search-time behaviour should be identical to the index-time behaviour in most of the cases. Currently ElasticSearch always uses the latest version from Lucene which can break backwards compatibility with the index for users that rely on behaviour that changed in new Lucene version.

Users should always use the version the index was created with unless it's explicitly configured.

closes #2945
2013-04-29 13:18:51 +02:00
Simon Willnauer bd7ff6946e Added X Versions of NGramTokenFilter and NGramTokenizer to ElasticSearch. These versions
don't produce broken positions anymore and prevent certain highlighter bugs that fail with
StringArrayOutOfBoundsExceptions as in #2931

This commit breaks backwards compatibility in terms of highlighting when NGramTokenFilter is used.
The highlighter will highlight the entire terms as produced by the tokenizer instead of the individual
sub-gram. To do sub-gram highlighting, the ngram tokenizer should be used. This behavior was based on
broken NGramTokenFilter behavior which will be fixed in Lucene 4.4 but was ported in this commit
to elasticsearch 0.90. The broken behavior can still be used if a version < LUCENE_42 is used
in the token filter mapping.

Closes #2931
2013-04-27 16:48:25 +02:00
Shay Banon f09ad507a4 open context stats
- rename to open_contexts from open, we might have other open stats in the future related to search (lucene index searchers?)
- add a test to verify it works
2013-04-27 15:09:47 +02:00
Simon Willnauer 8a7f81104f Remove XSimpleFragmentsBuilder and XScoreOrderFragmentsBuilder since the only difference
to the lucene version is that `discreteMultiValueHighlighting` does default to `true`. Yet
we set this anyway in the HighlightingPhase such that the classes are obsolet.
2013-04-26 20:04:38 +02:00
Simon Willnauer 355f80adc9 Added temporary fix for LUCENE-4899 where FastVectorHighlihgter failed with StringIndexOutOfBoundsException
if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets

The fixed BaseFragListBuilder was added as XSimpleFragListBuilder which triggers an assert once Elasticsearch
upgrades to Lucene 4.3
2013-04-26 19:48:48 +02:00
Simon Willnauer 2ed2fab904 Add assert that fails one Elasticsearch upgrades to Lucene 4.3 in order to remove the duplicated class 2013-04-26 19:16:21 +02:00
Alexander Reelsen 90353ceb79 Fixing possible NoClassDefFoundError when trying to load nonexisting classes
In order to handle exceptions correctly, when classes are not found, one
needs to handle ClassNotFoundException as well as NoClassDefFoundError
in order to be sure to have caught every possible case. We did not cater
for the latter in ImmutableSettings yet.

This fix is just executing the same logic for both exceptions instead of
simply bubbling up NoClassDefFoundError.
2013-04-26 10:34:10 +02:00
Alexander Reelsen 22e25cc165 Added stolen time to OsStats output 2013-04-25 10:46:24 +02:00
Shay Banon c4968d7d65 no longer support snappy... 2013-04-25 09:38:58 +02:00
Igor Motov 982b570037 Fix serialization of sync/async replication type 2013-04-25 08:25:31 +02:00
Martijn van Groningen dd12e0b86c If searchContext not set, abort parsing and throw ISE 2013-04-24 10:24:15 +02:00
Simon Willnauer c884304753 Fall back to local statistics if global statistics are not availalbe for a field or term
Closes #2926
2013-04-23 13:32:35 +02:00
Simon Willnauer f372f7c109 Cut over StringScriptDataComparator to use BytesRef instead of Strings
Closes #2920
2013-04-23 13:29:19 +02:00
Simon Willnauer 7a36bed031 Remove per-doc ord collector callback infavor of an iterator 2013-04-23 10:35:40 +02:00
Martijn van Groningen c390f9b1a9 Added more test assertions 2013-04-19 22:16:42 +02:00
Simon Willnauer 7ea6cd6888 use Double/Float.compare for stable and correct float sort order 2013-04-19 21:40:01 +02:00
Clinton Gormley 1483a3a0e5 Added tests for multi_match with minimum_should_match 2013-04-19 21:40:01 +02:00
Clinton Gormley e508b27203 Apply minimum_should_match to inner clauses of multi_match query
When specifying minimum_should_match in a multi_match query it was being applied
to the outer bool query instead of to each of the inner field-specific bool queries.

Closes #2918
2013-04-19 21:39:54 +02:00
Simon Willnauer 3ab56e16b7 Support empty string in FSTBytesAtomicFieldData 2013-04-19 12:49:06 +02:00
Simon Willnauer a1c62759c9 remove size bound from cache recycler for performance reasons 2013-04-19 12:36:12 +02:00
Simon Willnauer 2d13aa29f8 s/ES.RECYCLE/es.cache.recycle 2013-04-19 11:48:28 +02:00
Simon Willnauer 05b6c46bec allow CacheRecycler to be cleared via the REST API 2013-04-19 11:45:33 +02:00
Simon Willnauer 79db1bfbf0 make object caching optional 2013-04-18 19:14:19 +02:00
Florian Schilling 54cb4b9615 # Response for Cluster Settings Update API
If cluster settings are update the REST API returns the accepted values. For
example, updating the `cluster.routing.allocation.disable_allocation` via
cluster settings:

```curl -XPUT http://localhost:9200/_cluster/settings -d '{
    "transient":{
        "cluster.routing.allocation.disable_allocation":"true"
    }
}'```

will respond:

```{
    "persistent":{},
    "transient":{
        "cluster.routing.allocation.disable_allocation":"true"
    }
}```

Closes #2907
2013-04-18 11:34:58 +02:00
Lucas Ward 99c101c37e If a value/field is a Calendar, it will be converted to a Date using getTime()
Closes #2911
2013-04-18 10:57:08 +02:00
Shay Banon 0eb298fe64 use more aggressive concurrency levels for CHM
- long running ones with high update rates
- also expose a *system* property of es.useConcurrentHashMapV8 to use the new non blocking Java8 CHM impl
2013-04-17 14:28:38 -07:00
Shay Banon 271305d5eb Search Stats: Add current open searches
closes #2906
2013-04-16 18:08:57 -07:00
Simon Willnauer efc9e8fe7b only return primary if it is active in PlainOperationRounting
Closes #2896
2013-04-16 17:20:22 +02:00
Martijn van Groningen bcc16654d2 Better error messaging when postings_format can be resolved or when a custom postings_format type can't be instantiated.
Relates to #2893
2013-04-16 16:29:54 +02:00
Martijn van Groningen 9a1c03408b Added support for the `_cache` and` _cache_key` options to the `has_child` and `has_parent` filters.
Closes #2900
2013-04-16 14:42:45 +02:00
Florian Schilling ef5b7412e6 Allow PolygonBuilder to create polygons with hole
Closes #2899
2013-04-16 11:22:48 +02:00
Simon Willnauer 30f9f278c3 Added UNICODE_CHARACTER_CLASS support to Regex flags. This flag is only supported in Java7 and is ignored if set on a java 6 JVM
Closes #2895
2013-04-16 10:06:53 +02:00
uboness eb21526552 Added missing support for lat, lats, lon, lons for doc notation in scripts 2013-04-13 13:58:30 -07:00
uboness 20e6df9f34 Optimization in fielddata cache where ordinals are used instead of flat arrays when number of unique values is low 2013-04-13 12:42:53 -07:00
Igor Motov e7b49d8936 Add more dynamic settings validation 2013-04-12 20:55:45 -04:00
Shay Banon d385e1b356 Clear Cache API: Streamline option names
closes #2890
2013-04-12 15:58:24 -07:00
Shay Banon a2d72697eb Expose field level field data statistics
closes #2889
2013-04-12 15:51:08 -07:00
David Pilato 3b7a195f6f Add toString() for FilterBuilders
Closes #2887.
2013-04-12 22:27:51 +02:00
Martijn van Groningen bf21466291 CacheTests test fix. 2013-04-12 19:14:38 +02:00
Martijn van Groningen 80dbca0809 Field data: Try to load short values as byte values and load int values as short or byte values to reduce the size they take in memory. 2013-04-12 19:11:18 +02:00
Shay Banon 5fbd4a12a0 fix memory computation for int field data 2013-04-12 08:38:52 -07:00
Martijn van Groningen 5c90e5f940 If no options are specified with the clear cache api then all caches should be cleared.
Closes #2886
2013-04-12 15:24:50 +02:00
Igor Motov 00c035f88c Make sure that settings are propagated to all nodes 2013-04-11 10:59:14 -04:00
Martijn van Groningen 2dfcc3c740 Test that size is actually computed.
Relates to #2882
2013-04-11 10:22:48 +02:00
Simon Willnauer 9a2d27a035 rename prefix_length to prefix_len for consistency
Closes #2883
2013-04-10 17:39:32 +02:00
Martijn van Groningen 4fd8c2c6d2 Ordinals were omitted from fielddata cache size calculation if field has more than one term.
Closes #2882
2013-04-10 14:50:07 +02:00
Martijn van Groningen 637eeacb20 Better error description if field(s) (statistical facet) and value_field (term_stats facet) are not a numeric field 2013-04-10 11:11:52 +02:00
Martijn van Groningen 6a3c53ef44 Should prevent OOM 2013-04-10 10:00:51 +02:00
Martijn van Groningen b8b28041e5 Fix for extended facets test. 2013-04-10 00:47:00 +02:00
Igor Motov b0e44a2b40 Fix term counters in script field terms facet
Fixes #2878
2013-04-09 12:42:35 -04:00
Simon Willnauer ae74a8dbb7 Configure FieldData using a hash not a string
Closes #2876
2013-04-09 15:53:05 +02:00
Simon Willnauer 374bbbfa7b # FieldData Filter
FieldData is an in-memory representation of the term dictionary in an uninverted form. Under certain circumstances this FieldData representation can grow very large on high-cardinality fields like tokenized full-text. Depending on the use-case filtering the terms that are hold in the FieldData representation can heavily improve execution performance and application stability.
FieldData Filters can be applied on a per-segment basis. During FieldData loading the terms enumeration is passed through a filter predicate that  either accepts or rejects a term.

## Frequency Filter

The Frequency Filter acts as a high / low pass filter based on the document frequencies of a certain term within the segment that is loaded into field data. It allows to reject terms that are very high or low frequent based on absolute frequencies or percentages relative to the number of documents in the segment or more precise the number of document that have at least one value in the field that is loaded in the current segment.

Here is an example mapping

Here is an example mapping:

```json
{
    "tweet" : {
        "properties" : {
            "locale" : {
                "type" : "string",
                "fielddata" : "format=paged_bytes;filter.frequency.min=0.001;filter.frequency.max=0.1",
                "index" : "analyzed",
            }
        }
    }
}
```
### Paramters

 * `filter.frequency.min` - the minimum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
 * `filter.frequency.max` - the maximum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
 * `filter.frequency.min_segment_size` - the minimum number of documents in a segment in order for the filter to be applied. Small segments might be omitted with this setting.

## Regular Expression Filter

The regular expression filter applies a regular expression to each term  during loading and only loads terms into memory that match the given regular expression.

Here is an example mapping:

```json
{
    "tweet" : {
        "properties" : {
            "locale" : {
                "type" : "string",
                "fielddata" : "format=paged_bytes;filter.regex=^en_.*",
                "index" : "analyzed",
            }
        }
    }
}
```

Closes #2874
2013-04-09 11:34:48 +02:00
Igor Motov acc0950957 Get template should return warmers
Fixes #2868
2013-04-08 19:12:20 -04:00
Simon Willnauer a10c80e20f ensure that modificatons to the enum order trigger test failures since we rely on the ordinal 2013-04-08 23:29:56 +02:00
Simon Willnauer 7e77ddb88f use enum to represent flags and fail if flags are not respected 2013-04-08 22:56:11 +02:00
Igor Motov 2a588dc1f1 Fix IndexMissingException in get template request
Fixes #2873
2013-04-08 16:25:09 -04:00
Shay Banon 3120457bfe move to 0.90.0.RC3 snap 2013-04-08 05:48:29 -07:00
Shay Banon 3a8cba4d50 release 0.90.0.RC2 2013-04-08 05:46:26 -07:00
Shay Banon 5fa66cd592 Node Stats: Allow to explicitly get specific indices level node stats element
closes #2871
2013-04-07 20:22:48 -07:00
Shay Banon 15d7ae5983 FieldData Stats: Add field data stats to indices stats API
closes #2870
2013-04-07 18:30:24 -07:00
Martijn van Groningen 86c1714bf3 Also test the `fields` option. 2013-04-07 21:52:19 +02:00
Simon Willnauer 7ad03ed789 Use IndexOption.DOCS_ONLY for boolean fields
Closes #2866
2013-04-06 22:41:22 +02:00
Shay Banon 9f6c8c88f3 improve on shard level filter/id cache stats
use just the removal listener and back to the IndexReader#coreCacheKey as the actual field as part of the cache key
2013-04-06 00:02:42 +02:00
Shay Banon 815917fbf8 confusing code..., but we can't release the searcher in a get result case
we need that searcher later on..., need to think of how to simplify that..., added a comment for now
2013-04-05 23:27:03 +02:00
Simon Willnauer 36ffd6d582 release searcher in finally block rather than relying on an exception that is thrown 2013-04-05 22:45:52 +02:00
Shay Banon 84670212a6 Filter / Id Cache Stats: Add to Indices Stats API, revise node stats API
closes #2862
2013-04-05 20:02:32 +02:00
Simon Willnauer 5e7ad9832c Added more evil tests for different field data implementations 2013-04-05 18:12:50 +02:00
Martijn van Groningen 224faffead Added an extended test for terms facet with a decent number of documents / field values and randomly tests various options. Also fixed an issue where `regex` and `excludes` were ignored when `all_terms` was used. 2013-04-05 17:38:46 +02:00
David Pilato 4b1ec037f8 Fix test for #2668. 2013-04-05 15:00:28 +02:00
Martijn van Groningen 9b5c74d43e Made sure `all_terms` works consistently. In some cases the `all_terms` option was ignored: * Faceting on number based fields. * The `execution_type` was set to `map`. * In the case the `fields` option was used.
Closes #2861
2013-04-05 14:27:19 +02:00
Shay Banon 831ea789aa rename getByOrd to getValueByOrd (to match BytesValues.WithOrdinals)
also make it public so it can be used when iterating over ords
2013-04-05 13:56:33 +02:00
Shay Banon bcc14cde9f make numeric namings consistent with bytes ones
also add the ability to get the ordinals from DoubleValues.WithOrdinals and LongValues.WithOrdinals
2013-04-05 13:33:56 +02:00
David Pilato 36b92be212 List of existing plugins with Node Info API
We want to display information about loaded plugins in Node Info API using plugin option:

```sh
curl http://localhost:9200/_nodes?plugin=true
```

For example, on a 4 nodes cluster, it could provide the following output:

```javascript
{
  "ok" : true,
  "cluster_name" : "test-cluster-MacBook-Air-de-David.local",
  "nodes" : {
    "lodYfbFTRnmwE6rjWGGyQQ" : {
      "name" : "node1",
      "transport_address" : "inet[/172.18.58.139:9300]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9200]",
      "plugins" : [ ]
    },
    "hJLXmY_NTrCytiIMbX4_1g" : {
      "name" : "node4",
      "transport_address" : "inet[/172.18.58.139:9303]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9203]",
      "plugins" : [ {
        "name" : "test-plugin",
        "description" : "test-plugin description",
        "site" : true,
        "jvm" : false
      }, {
        "name" : "test-no-version-plugin",
        "description" : "test-no-version-plugin description",
        "site" : true,
        "jvm" : false
      }, {
        "name" : "dummy",
        "description" : "No description found for dummy.",
        "url" : "/_plugin/dummy/",
        "site" : false,
        "jvm" : true
      } ]
    },
    "bnoySsBfTrSzbDRZ0BFHvg" : {
      "name" : "node2",
      "transport_address" : "inet[/172.18.58.139:9301]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9201]",
      "plugins" : [ {
        "name" : "dummy",
        "description" : "This is a description for a dummy test site plugin.",
        "url" : "/_plugin/dummy/",
        "site" : false,
        "jvm" : true
      } ]
    },
    "0Vwil01LSfK9YgRrMce3Ug" : {
      "name" : "node3",
      "transport_address" : "inet[/172.18.58.139:9302]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9202]",
      "plugins" : [ {
        "name" : "test-plugin",
        "description" : "test-plugin description",
        "site" : true,
        "jvm" : false
      } ]
    }
  }
}
```

Information are cached for 10 seconds by default. Modify `plugins.info_refresh_interval` property if needed.
Setting `plugins.info_refresh_interval` to `-1` will cause infinite caching.
Setting `plugins.info_refresh_interval` to `0` will disable caching.

Closes #2668.
2013-04-05 11:36:56 +02:00
Simon Willnauer f3e6fe094a beef up term facet tests 2013-04-05 11:05:24 +02:00
Simon Willnauer 9fbe075aec Added test that compares concurrent facet execution results with a serial execution result 2013-04-05 10:36:53 +02:00
Shay Banon 5af6343697 allow to disable the optimization of removal of ords on single value numerics/geo field data
field data settings in the mappings can have ordinals=always option
2013-04-05 00:44:07 +02:00
Shay Banon 54f685674b Thread Pool: Update default settings (move from default cached to fixed)
closes #2858
2013-04-04 23:24:49 +02:00
Simon Willnauer f1dd867c4f Catch Throwable when listener is called rather then Exception to prevent possible hangs if fatal exceptions or errors are thrown 2013-04-04 22:58:38 +02:00
Shay Banon a206aa4548 Settings / Config: Allow to explicitly specify external environment variable syntax, in which case its optional
fixes #2855
2013-04-04 16:30:24 +02:00
Simon Willnauer d758401add Cleanup ScriptDocValues. This commit adds a getValues method to all ScriptDocValues for easy access
in scripts via doc['field'].values / value.
2013-04-04 16:07:54 +02:00
Alexander Reelsen 4f96b36376 Returning configuration of root field mappers toXContent method only if they are enabled 2013-04-04 15:55:12 +02:00
Alexander Reelsen fbdf89c636 Fix for ttl fieldmapper to support disabling correctly. Also returning only booleans, not enums in toXContent 2013-04-04 12:27:23 +02:00
Alexander Reelsen 955788e9a5 Allowing to disable size field mapper after enabling 2013-04-04 09:41:41 +02:00
Alexander Reelsen e662e4d55d Allowing to disable index field mapper after enabling 2013-04-04 09:41:41 +02:00
Alexander Reelsen 9cc2563d5e Allowing to disable timestamp field mapper after enabling 2013-04-04 09:41:41 +02:00
Simon Willnauer 223ec2c42d Beef up FieldData tests by running one on one duells 2013-04-03 18:38:25 +02:00
Igor Motov 356329df00 Improve stability of ClusterHealthTests 2013-04-03 12:07:42 -04:00
Igor Motov d2f6349dcf Improve stability of MinimumMasterNodesTests 2013-04-03 11:51:28 -04:00
Martijn van Groningen 0a89c80554 Fixed issue where a doc is omitted from the hits if it has no geo point and sorting is based on geo distance.
Closes #2851
2013-04-03 17:25:16 +02:00
Simon Willnauer bbe619a416 Call onFailure for every exception case even in the case of an error / runtime exception
Closes #2848
2013-04-03 12:25:58 +02:00
Simon Willnauer eb8b38d027 Upgrade to Lucene 4.2.1 2013-04-03 12:22:39 +02:00
Martijn van Groningen af2f31c33e Fixed typo 2013-04-02 22:31:06 +02:00
Martijn van Groningen cf00acf5b0 If no specified index or alias exists and `ignore_indices` is set to `missing` an index missing error is returned instead of resolving to all open indices (e.g. when searching). This breaks backwards comp. with 0.20.x and before.
Closes #2837
2013-04-02 19:06:17 +02:00
Alexander Reelsen 0a466352cd Add support for creating a fedora RPM package with maven
Note: This has been disabled by default and is therefore not included in a
standard build. The main reason for this is, that you need to have a RPM
binary and the rpm development packages installed, which is not the case
on many systems.

The package contains an init.d-script as well as systemd configurations.

You can build your own RPM package simply by running 'maven rpm:rpm'
2013-04-02 16:19:45 +02:00
Shay Banon 10a76ad5d8 fix seen readers counter
since clear can be called on percolator as well, we need to make sure we inc the counter even for non segment readers
2013-04-02 13:25:56 +02:00
Shay Banon 31d1e6cfe7 Field Data: Simplify field data cache settings
closes #2843
2013-04-02 12:44:39 +02:00
Alexander Reelsen d866321c55 Merge pull request #2811 from spinscale/document-mapper-merge
Allow to update ttl field mapping after initial creation. Fixes #2136
2013-04-01 23:37:29 -07:00
Simon Willnauer 7efa92636a Cut over to IntsRef in favor of IntsArrayRef 2013-03-31 10:46:21 +02:00
Simon Willnauer b3356d9f8d remove dead code 2013-03-31 10:19:17 +02:00
Simon Willnauer 2a09342405 remove Bytes.java in favor of BytesRef / ArrayUtils 2013-03-31 08:54:39 +02:00
Simon Willnauer e864d5785e optimize matcher reset to not create unnecessary string objects 2013-03-30 17:35:23 +01:00
Simon Willnauer fefa8da2ea remove StringValues in favor of BytesValues 2013-03-30 17:35:23 +01:00
Simon Willnauer dff2a9279c clean-up double values 2013-03-30 17:35:23 +01:00
Simon Willnauer d5c271acf5 clean-up long values 2013-03-30 17:35:22 +01:00
Simon Willnauer 5aedf74fb0 Remove getValues from numeric and string field data & clean up geo field data 2013-03-30 17:35:22 +01:00
Simon Willnauer 7f81469137 Refactor BytesValues to be reused as the interface for HashedBytesValues and remove HashBytesValues. 2013-03-30 17:35:22 +01:00
Simon Willnauer 129f02623b Added FST based FieldData implementation holding all data in a per segment FST.
This commit factors our a common API for BytesValues based impl to shared code and reduce code duplication.
2013-03-30 17:35:22 +01:00
Shay Banon 72c76c2799 fail on malformed sort 2013-03-30 13:58:39 +01:00
Shay Banon 6a1cb8f61b {sort: "field"} throws misleading errors
fixes #2835
2013-03-30 13:46:53 +01:00
Martijn van Groningen 2e93329e23 If match then go to next doc 2013-03-29 16:57:42 +01:00
Martijn van Groningen a89dde8bac Fixed `bool` filter bugs:
* In the case only should clauses were specified with specific type of filters, the first clause determined which documents matched.
 * In some cases the minimum at least 1 should clause should match behaviour was broken.
2013-03-29 16:48:36 +01:00
Igor Motov b657bdfa1a Optimize aliases processing
Closes# 2832
2013-03-29 10:44:45 -04:00
Alexander Reelsen a880a6c85e Allow to update ttl field mapping after initial creation. Fixes #2136
Adding possibility to change TTL field mapper data without specifying enabled flag in mapping update
2013-03-28 17:25:28 +01:00
Martijn van Groningen 941aa17a43 Added sort mode to geo distance sorting. Closes #1846 2013-03-28 17:04:42 +01:00
Igor Motov 9bc50ea609 Fix LeastUsedDistributor and ensure random distribution for multiple non-fs directories
If we cannot determine available space the fallback scenario is now to use random distribution instead of always using the last directory.

Fixes #2820
2013-03-28 11:08:54 -04:00
Shay Banon 1fc37e5954 Segments API: Add version & compound for each segment
closes #2823
2013-03-28 15:34:38 +01:00
Shay Banon 473473e867 remove the field settings for query parser cache, not really relevant 2013-03-27 20:39:36 +01:00
Shay Banon c18c609af1 Date math in query_string caches now()
fixes #2808
2013-03-27 20:32:38 +01:00
Igor Motov 5bb75f9da3 Move applying alias filter to ContextSearch#preProcess() 2013-03-27 09:23:54 -04:00
Simon Willnauer 17f83f33bb Terminate early when no terms left in the suggest string.
Closes #2817
2013-03-26 17:44:34 +01:00
Igor Motov 9ae421a8b2 Fix filtering aliases with non-empty sort options
Fixes #2816
2013-03-26 07:23:44 -04:00
Shay Banon d35a3b03c8 Warmers: Have an explicit warmer thread pool
add 1 in case there is 1 core...
closes #2815
2013-03-25 23:34:52 +01:00
Simon Willnauer aa97c031f2 Don't reset tokenstream before passing to the MemoryIndex, otherwise some tokenizer might swallow tokens.
Closes #2814
2013-03-25 22:46:11 +01:00
Shay Banon b7106622d8 Warmers: Have an explicit warmer thread pool
Have an explicit threadpool warmer that is dedicated to execute warmers. Currently, it uses the search threadpool, which does not work well since the number of concurrent searches should be separate from the number of concurrent warmers allows, also the characteristics of the search pool (for example, bounded queue_size) might not fit well with how warmers should be executed (they should not be "rejected").

closes #2815
2013-03-25 16:46:37 +01:00
Shay Banon 0e815ce11c add 0.20.7 2013-03-25 12:33:55 +01:00
adavis 6a93fbcf07 Adding parsing for zero terms query for multi match
Tests for multi-match zero_terms_query and making references to the ZeroTermsQuery enum consistent to others used in MultiMatchQueryBuilder
2013-03-23 08:59:39 +01:00
adavis 3f83904680 Fixes java6_u31 compile error w.r.t. type inference 2013-03-22 16:46:42 -07:00
Simon Willnauer 560d2c094e Fix issue where entire shards are lost due to too many open files exceptions and a but in Lucene's
IndexWriter / DirectoryReader where an existsing index was not detected and then deleted due to a wrong
creation mode. See LUCENE-4870

Closes #2812
2013-03-22 17:18:55 +01:00
Florian Schilling 1a67793a4b Added Script test for geo distance tests and modified GeoUtils.normalizePoint() 2013-03-22 13:34:18 +01:00
Simon Willnauer 075779a397 Call onMissing if doc has no value in the field.
Closes #2807
2013-03-21 22:45:17 +01:00
Simon Willnauer 064d272916 Respect offset and length when iterating over BytesRef in Uid. The length is starting at offset
Closes #2806
2013-03-21 19:29:05 +01:00
Simon Willnauer 5f05c2106f Use more efficient StemmerOverrideFilter from Lucene trunk
Closes #2800
2013-03-21 07:58:51 +01:00
Shay Banon ea698add72 move to 0.90.0.RC2 snap 2013-03-20 19:06:30 +01:00
Shay Banon a2f14b68e8 release 0.90.0.RC1 2013-03-20 19:05:08 +01:00
Florian Schilling f08d458545 # GeoShape Precision
The `geo_shape` precision could be only set via `tree_levels` so far. A new option `precision` now allows to set the levels of the underlying tree structures to be set by distances like `50m`. The def

## Example
```json
curl -XPUT 'http://127.0.0.1:9200/myindex/' -d '{
  "mappings" : {
      "type1": {
          "dynamic": "false",
          "properties": {
              "location" : {
                  "type" : "geo_shape",
                  "geohash" : "true",
                  "store" : "yes",
                  "precision":"50m"
              }
          }
      }
  }
}'
```

## Changes
- GeoUtils defines the [WGS84](http://en.wikipedia.org/wiki/WGS84) reference ellipsoid of earth
- DistanceUnits refer to a more precise definition of earth circumference
- DistanceUnits for inch, yard and meter have been defined
- Set default levels in GeoShapeFieldMapper to 50m precision

Closes #2803
2013-03-20 14:52:47 +01:00
Simon Willnauer 4705eb2959 Lazily initialize the delegate in BloomFilteredPostingsFormat to prevent unnecessary loading if bloomfilter terminates early 2013-03-20 12:43:17 +01:00
Simon Willnauer 747ce36915 Specialise the default codec to reuse Lucene41 files in the common case.
Closes #2799
2013-03-20 12:43:17 +01:00
Shay Banon 54e7e309a5 better comment... 2013-03-19 14:36:13 +01:00
Shay Banon d5beea4bba if multicast socket closes, try and restart it
also, throttle on socket failures, so it won't spin out of control...
relates to #2783
2013-03-19 11:20:47 +01:00
Shay Banon f4a212420b multicastSocket should be volatile as well... 2013-03-19 10:23:39 +01:00
Shay Banon c92207f483 broadcast API to by default ignore missing / illegal shard state
this happens for example because we list assigned shards, and they might not have been allocated on the relevant node yet, no need to list those as actual failures in some APIs
2013-03-19 10:22:43 +01:00
Shay Banon aca713d68e tar.gz distro by mistake include a windows lib 2013-03-18 22:46:04 +01:00
Shay Banon 566d1d13f7 fix javadoc 2013-03-18 22:04:31 +01:00
Clinton Gormley 2123ab591c Correct filter strategy opt: random_access_random to random_access_always 2013-03-18 20:17:26 +01:00
Shay Banon 7d9cef904b Field Data: optimize long type to use narrowest possible type automatically
closes #2795
2013-03-18 12:37:15 +01:00
Shay Banon 82072fc47f make ES compile with java 8
- that isAnnotationPresent bug is known, and probably will be fixed in later versions, but it costs us nothing to not use it now
- some tests fail, mainly due to consistent ordering expected from Map (within versions) which does not seem to be preserved, need to fix those tests to be agnostic to it
2013-03-18 01:33:09 +01:00
Shay Banon e347a626da use ImmutableList.Builder instead of ArrayList 2013-03-17 21:55:07 +01:00
Shay Banon 2ed6ea25cc fix logging message to include the index
also add the list of current indices
2013-03-16 22:58:45 +01:00
Shay Banon 111a13222e Mapping: dynamic flag is explicitly returned even when not set
fixes #2789
2013-03-16 01:29:22 +01:00
Simon Willnauer c25eb7defe Fix bug in RateLimiter.SimpleRateLimiter causing numeric overflow in StoreStats
Closes #2785
2013-03-15 23:36:31 +01:00
Shay Banon d5da8f22ff improve TODO comment 2013-03-15 21:46:02 +01:00
Simon Willnauer 0e3b88be35 add CamelCase support to Suggester where missing 2013-03-15 15:07:15 +01:00
Simon Willnauer e0eff7d9d3 Remove `sort_order` and `sort_mode` in favor of `order` and `mode`
Closes #2781
2013-03-15 13:57:39 +01:00
Simon Willnauer 33608c333f Add `sort_oder` and `sortOrder` as valid field names for defining the sort order in a Sort object.
Closes #2767
2013-03-15 08:42:19 +01:00
Simon Willnauer 5f20d81199 Make StupidBackoff the default smoothing model for phrase suggester
Closes #2780
2013-03-14 23:03:15 +01:00
Shay Banon 91c51ef05c minor cleanup suggest api
- make sure we close the parser
- fail when no content is provided in the rest request
- reuse the suggest parse element
2013-03-13 12:18:14 -07:00
Florian Schilling 25bd9cecd0 # REST Suggester API
The REST Suggester API binds the 'Suggest API' to the REST Layer directly. Hence there is no need to touch the query layer for requesting suggestions.
This API extracts the Phrase Suggester API and makes 'suggestion request' top-level objects in suggestion requests. The complete API can be found in the
underlying ["Suggest Feature API"](http://www.elasticsearch.org/guide/reference/api/search/suggest.html).

# API Example
The following examples show how Suggest Actions work on the REST layer. According to this a simple request and its response will be shown.

## Suggestion Request
```json
curl -XPOST 'localhost:9200/_suggest?pretty=true' -d '{
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
        "phrase" : {
            "analyzer" : "bigram",
            "field" : "bigram",
            "size" : 1,
            "real_word_error_likelihood" : 0.95,
            "max_errors" : 0.5,
            "gram_size" : 2
        }
    }
}'
```
This example shows how to query a suggestion for the global text 'Xor the Got-Jewel'. A 'simple phrase' suggestion is requested and
a 'direct generator' is configured to generate the candidates.

## Suggestion Response
On success the request above will reply with a response like the following:
```json
{
    "simple_phrase" : [ {
        "text" : "Xor the Got-Jewel",
        "offset" : 0,
        "length" : 17,
        "options" : [ {
            "text" : "xorr the the got got jewel",
            "score" : 3.5283546E-4
        } ]
    } ]
}
```
The 'suggest'-response contains a single 'simple phrase' which contains an 'option' in turn. This option represents a suggestion of the
queried text. It contains the corrected text and a score indicating the probability of this option to be meant.

Closes #2774
2013-03-13 19:36:29 +01:00
Jörg Prante a127f2d2e8 avoiding NPE in Sigar FS 2013-03-13 10:05:59 -07:00
Alexander Reelsen 125b33d3dc GeoJSONShapeParser parses JSON correctly and extracts coordinates even if 'crs' field is included.
Fixes #2763
2013-03-13 15:17:21 +01:00
Simon Willnauer 365cde82d3 Use numOrds rather than numDocs as upperbound for sorting
Closes #2773
2013-03-13 15:13:56 +01:00
Clinton Gormley 93ca6e2c4b tieBreaker in MultiMatchQueryBuilder should be a float, not an integer
Closes #2772
2013-03-13 13:44:59 +01:00
Shay Banon 5ed9fb2c54 support also mode in search sorting, and fail on illegal parameters 2013-03-12 15:16:25 -07:00
Shay Banon 55ceb01c44 force close connection if its on a connect failure
relates to Repeated ConnectExceptions in logs until node is restarted, fixes #2766
2013-03-12 14:49:07 -07:00
Shay Banon 877105ee19 no need for specific time / empty based classes, just as final fields 2013-03-12 12:32:05 -07:00
Igor Motov 3a534c64e5 Add dynamic settings validation
Fixes #2749
2013-03-12 14:41:00 -04:00
Simon Willnauer c008c59927 add missing license header 2013-03-12 14:57:47 +01:00
Simon Willnauer c5395436e6 fix test bug where a small time window exists that can trigger a false failure due to default concurrent recoveries 2013-03-12 14:48:14 +01:00
Simon Willnauer 237c4ddf54 Introdue ParentIdCollector that collects only if the parent ID is non null ie. if the document has a parent.
Closes #2744
2013-03-11 21:11:05 +01:00
Clinton Gormley 7961dfa7ab Fixed a typo in an error message "should exists" -> "should exist" 2013-03-11 15:37:20 +01:00
Simon Willnauer 9442c41481 enable testcase that relied on a Lucene 4.2 fix 2013-03-11 12:57:24 +01:00
Simon Willnauer 4e7cff488e add test that ensures that we bumb the version on a Lucene Upgrade 2013-03-11 10:30:48 +01:00
Simon Willnauer ebadd9ebbd Fix tests since Lucene 4.2 we can support date math in Fuzzy-Search Syntax 2013-03-11 08:23:01 +01:00
Simon Willnauer a37f1f55cc Add tests for highlighting boost query.
Closes #1314
2013-03-11 08:23:01 +01:00
Simon Willnauer 11bf7a8b1a Upgrade to Lucene 4.2 2013-03-11 08:23:01 +01:00
Simon Willnauer 75fd6d4985 Added KeywordRepeatFilter that allows emitting stemmed and unstemmed versions of the same token if stemmers are used
Closes #2753
2013-03-09 23:09:59 +01:00
Simon Willnauer dc9a052287 Respect CandidateGenerator#size if set in the request and reduce the total #of candidates to the shard size.
Closes #2752
2013-03-09 13:36:40 +01:00
Shay Banon cc6c07365c has_child query AVG score mode does not always work correctly
fixes #2750
2013-03-08 08:50:11 -08:00
Shay Banon eb956e7c09 Term/Terms filters on numeric fields gives wrong result
fixes #2746
2013-03-07 22:12:22 -08:00
Shay Banon c298c19177 don't use cache for ordinals for small max ord 2013-03-07 08:45:01 -08:00
Simon Willnauer 2c8d8ef8e0 check for null on setters taht must not be null in IndicesReplicationOperationRequest 2013-03-07 10:18:10 +01:00
Benjamin Devèze 35f5ca915d Add support for ignore_indices to delete by query
Closes #2734
2013-03-07 10:17:51 +01:00
Simon Willnauer 12a2808168 exhaust object to allow subsequent objects to be parsed correctly 2013-03-06 15:34:59 +01:00
Simon Willnauer 1f217f6a7b Move smoothing model into its own sub-object in the PhraseSuggest request
Closes #2735
2013-03-06 14:31:21 +01:00
Shay Banon e1409a9f0e Problems with range searches for time with lte
fixes #2731
2013-03-05 18:10:30 -08:00
Shay Banon 9a25867bfe Network: A closed channel might not always fire up a close event
fixes #2733
2013-03-05 11:49:10 -08:00
Igor Motov acff102234 Implement search shards API
Closes #2726
2013-03-05 09:17:59 -05:00
Simon Willnauer 1eb24d7efc use a base ShingleFilterFactory to simplify default shingle detection 2013-03-05 12:32:50 +01:00
Simon Willnauer 0f95499703 if word scorer is on unigram make sure we score the current position not position 0 2013-03-05 12:31:32 +01:00
Simon Willnauer 876b5a3dcd prefer totalTermFrequency over docFreq in PhraseSuggester 2013-03-05 10:46:25 +01:00
Simon Willnauer 315744be55 Set shardSize according to the total size if not explicitly specified. Closes #2729 2013-03-05 09:22:23 +01:00
Shay Banon 3e264f6b95 cleanup deletion of content in shards
we are very conservative on when we delete data, remove the actual options of deleting data that are not used
2013-03-04 20:41:19 -08:00
Shay Banon 1ed07c1794 add a list of files that exists in the index to the failure 2013-03-04 18:15:06 -08:00
Shay Banon d609571897 add close method to field data 2013-03-04 16:42:29 -08:00
Shay Banon cfd8bddde4 Remove JMX connector creation flags, and JMX attributes
closes #2728
2013-03-04 16:12:18 -08:00
Shay Banon 774622abfb Change field data stats header from `field_data` to `fielddata`.
fixes #2727
2013-03-04 23:50:33 +01:00
Shay Banon d2dc672f43 allow to specify a list of settings to get a value for 2013-03-04 23:41:43 +01:00
Drew Raines a8d52b58b6 Remove obsolete test. 2013-03-04 15:22:40 -06:00
Andrii Gakhov dc28151ad7 fixed interchanged values in field_data stats fixes #2724 2013-03-04 11:19:33 +01:00
Shay Banon a1b2434339 revert change on listing plugins on /_plugin
we should provide it as part of nodes info
relates to #2664
2013-03-03 21:52:44 +01:00
Shay Banon a7da27c714 Field Data: Add `node` level cache type
closes #2722
2013-03-03 19:55:06 +01:00
Shay Banon e01879a698 add evictions stats to field data 2013-03-03 18:41:17 +01:00
Simon Willnauer e9ba98913b simplify searchShard selection when routing is present 2013-03-03 14:32:19 +01:00
Benjamin Devèze 09f20e3d4c Fix bug when searching concrete and routing aliased indices
Closes #2683
2013-03-03 14:31:57 +01:00
uboness 881cb7900c Change geo_shapes support:
* Exposed the spatial strategy to be configurable as part of the geo_shape mappings
* Exposed the spatial strategy to be customizable at query time (will be used to generate the geo_shape filter/query)
* Removed XTermQueryPrefixTreeStrategy and reverted to use the lucene TermQueryPrefixTreeStrategy instead
* Made the RecursivePrefixTreeStrategy the default strategy to be used
* Removed support for all spatial operations except "intersects"
* Updated both the GeoShapeQueryBuilder and GeoShapeFilterBuilder with all the changes (removed the option of specifying the operation type (as only intersects is supported) and added the option of setting the filter/query spatial strategy

Closes #2720
2013-03-02 17:13:58 +01:00
Simon Willnauer b9513511e0 Check for null query on Percolator query loading and omit the query if it can't be parsed.
Closes #2547
2013-03-02 16:55:39 +01:00
Shay Banon 0be5a7888f fix local flag in cluster health 2013-03-02 16:00:10 +01:00
Shay Banon 5dd18acd0e proper reason for cluster state task 2013-03-02 15:48:01 +01:00
Shay Banon 50d121315b add ability for cluster health to wait for current events to be processed
help with tests that run on slow machines
2013-03-02 14:25:45 +01:00
tristanbuckner 9273d76cdf Make BoolFilterBuilder output proper json 2013-03-02 01:07:50 +01:00
Shay Banon ea097afd91 add proper testing for bool filter 2013-03-02 01:07:05 +01:00
Shay Banon 361d6bf89a spin a bit to wait for condition in test, so slow machines will still run it correctly 2013-03-01 23:36:13 +01:00
Shay Banon fe8b3725bb lazy set the indices on the search request now that its validated 2013-03-01 22:45:59 +01:00
Shay Banon 6687ecb038 Query DSL: Filtered query to make query optional (defaults to mach_all)
closes #2718
2013-03-01 22:40:22 +01:00
Matt Weber dfd92265b7 Correct order of routing and parent params for Get
The order in which routing and parent parameters are set is important.  The
routing parameter must be set first or it will overwrite the parent routing
value.
2013-03-01 22:24:14 +01:00
Shay Banon 2eea99255d Analyze API returns in YAML format if analyzed string begins with ---
fixes #2624
2013-03-01 22:17:09 +01:00
Shay Banon 9b68e98ea2 more strict check before trying to parse and detect a string as a date
fixes #2694
2013-03-01 22:15:32 +01:00
Jeremy Jongsma d16efbe47f Throw correct ClassNotFoundException to debug classloader issues 2013-03-01 21:56:59 +01:00
Simon Willnauer aaa3c48b3c Throw IAE if indices is null or contains a null value.
Closes #2656
2013-03-01 21:26:23 +01:00
Simon Willnauer fced68c22d ensure that suggestion only added on reduce if they are present in the shard response 2013-03-01 21:09:10 +01:00
Martijn van Groningen d99b532f0f Supporting sort modes `avg` and `sum` when sorting inside nested objects.
Previously this commit either sort modes `min` or `max` (depending on sort order) was used when sort modes `avg` and `sum` were picked.

Closes #2701
2013-03-01 19:53:20 +01:00
Simon Willnauer 39f362326e Short Curcuit response if no indices exits and make sure listener is notified.
Closes #2692
2013-03-01 15:15:56 +01:00
Simon Willnauer 3c1f291801 Fail in metadata parsing if the id path is not a value but rather an array or an object.
Closes #2275
2013-03-01 13:00:29 +01:00
Simon Willnauer b03f3fcd6c throw IAE if fieldname is null - Closes #2711 2013-03-01 12:10:07 +01:00
Simon Willnauer 9c3898900d always use the max score across the shards in suggest response 2013-03-01 12:09:29 +01:00
Shay Banon 30075bb6f9 add info in test for actual search failures 2013-03-01 00:00:09 +01:00
Shay Banon 849a3677cd improve timing in test to wait for state with graceful timeouts
(yet, validate early and exit when relevant)
2013-02-28 23:44:52 +01:00
Simon Willnauer c90c5cbf85 fix bug in StupidBackoffScorer were previous word and current word were flipped creating non-existing bigram 2013-02-28 21:23:41 +01:00
Simon Willnauer b4b3e350a6 Expose _explain via POST
Closes #2710
2013-02-28 18:19:08 +01:00
Simon Willnauer d4ec03ed76 # Phrase Suggester
The `term` suggester provides a very convenient API to access word alternatives on token
basis within a certain string distance. The API allows accessing each token in the stream
individually while suggest-selection is left to the API consumer. Yet, often already ranked
/ selected suggestions are required in order to present to the end-user.
Inside ElasticSearch we have the ability to access way more statistics and information quickly
to make better decision which token alternative to pick or if to pick an alternative at all.

This `phrase` suggester adds some logic on top of the `term` suggester to select entire
corrected phrases instead of individual tokens weighted based on a *ngram-langugage models*. In practice it
will be able to make better decision about which tokens to pick based on co-occurence and frequencies.
The current implementation is kept quite general and leaves room for future improvements.

# API Example

The `phrase` request is defined along side the query part in the json request:

```json
curl -s -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 1,
        "real_word_error_likelihood" : 0.95,
        "max_errors" : 0.5,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "body",
          "suggest_mode" : "always",
          "min_word_len" : 1
        } ]
      }
    }
  }
}
```

The response contains suggested sored by the most likely spell correction first. In this case we got the expected correction
`xorr the god jewel` first while the second correction is less conservative where only one of the errors is corrected. Note, the request
is executed with `max_errors` set to `0.5` so 50% of the terms can contain misspellings (See parameter descriptions below).

```json
  {
  "took" : 37,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2938,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "simple_phrase" : [ {
      "text" : "Xor the Got-Jewel",
      "offset" : 0,
      "length" : 17,
      "options" : [ {
        "text" : "xorr the god jewel",
        "score" : 0.17877324
      }, {
        "text" : "xor the god jewel",
        "score" : 0.14231323
      } ]
    } ]
  }
}
````

# Phrase suggest API

## Basic parameters

* `field` - the name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections.
* `gram_size` - sets max size of the n-grams (shingles) in the `field`. If the field doesn't contain n-grams (shingles) this should be omitted or set to `1`.
* `real_word_error_likelihood` - the likelihood of a term being a misspelled even if the term exists in the dictionary. The default it `0.95` corresponding to 5% or the real words are misspelled.
* `confidence` - The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggest candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of `1.0` will only return suggestions that score higher than the input phrase. If set to `0.0` the top N candidates are returned. The default is `1.0`.
* `max_errors` - the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range `[0..1)` as a fraction of the actual query terms a number `>=1` as an absolut number of query terms. The default is set to `1.0` which corresponds to that only corrections with at most 1 misspelled term are returned.
* `separator` - the separator that is used to separate terms in the bigram field. If not set the whitespce character is used as a separator.
* `size` - the number of candidates that are generated for each individual query term Low numbers like `3` or `5` typically produce good results. Raising this can bring up terms with higher edit distances. The default is `5`.
* `analyzer` -  Sets the analyzer to analyse to suggest text with. Defaults to the search analyzer of the suggest field passed via `field`.
* `shard_size` - Sets the maximum number of suggested term to be retrieved from each individual shard. During the reduce phase the only the top N suggestions are returned based on the `size` option. Defaults to `5`.
* `text` - Sets the text / query to provide suggestions for.

## Smoothing Models
The `phrase` suggester supports multiple smoothing models to balance weight between infrequent grams (grams (shingles) are not existing in the index) and frequent grams (appear at least once in the index).
* `laplace` - the default model that uses an additive smoothing model where a constant (typically `1.0` or smaller) is added to all counts to balance weights, The default `alpha` is `0.5`.
* `stupid_backoff` - a simple backoff model that backs off to lower order n-gram models if the higher order count is `0` and discounts the lower order n-gram model by a constant factor. The default `discount` is `0.4`.
* `linear_interpolation` - a smoothing model that takes the weighted mean of the unigrams, bigrams and trigrams based on user supplied weights (lambdas). Linear Interpolation doesn't have any default values. All parameters (`trigram_lambda`, `bigram_lambda`, `unigram_lambda`) must be supplied.

## Candidate Generators
The `phrase` suggester uses candidate generators to produce a list of possible terms per term in the given text. A single candidate generator is similar to a `term` suggester called for each individual term in the text. The output of the generators is subsequently scored in in combination with the candidates from the other terms to for suggestion candidates.
Currently only one type of candidate generator is supported, the `direct_generator`. The Phrase suggest API accepts a list of generators under the key `direct_generator` each of the generators in the list are called per term in the original text.

## Direct Generators

The direct generators support the following parameters:

* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
 * `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
 * `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
 * `always` - Suggest any matching suggestions based on terms in the suggest text.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.
* pre_filter -  a filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. (optional)
* post_filter - a filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. (optional)

The following example shows a `phrase` suggest call with two generators, the first one is using a field containing ordinary indexed terms and the second one uses a field that uses
terms indexed with a `reverse` filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The `pre_filter` and `post_filter` options accept ordinary analyzer names.

```json
curl -s -XPOST 'localhost:9200/_search' -d {
 "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 4,
        "real_word_error_likelihood" : 0.95,
        "confidence" : 2.0,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "body",
          "suggest_mode" : "always",
          "min_word_len" : 1
        }, {
          "field" : "reverse",
          "suggest_mode" : "always",
          "min_word_len" : 1,
          "pre_filter" : "reverse",
          "post_filter" : "reverse"
        } ]
      }
    }
  }
}
```

`pre_filter` and `post_filter` can also be used to inject synonyms after candidates are generated. For instance for the query `captain usq` we might generate a candidate `usa` for term `usq` which is a synonym for `america` which allows to present `captain america` to the user if this phrase scores high enough.

Closes #2709
2013-02-28 16:17:59 +01:00
Shay Banon 2bc624806d not bytes... 2013-02-28 16:02:38 +01:00
Shay Banon 7400c30eba fail a shard if a merge failure occurs 2013-02-27 23:44:55 +01:00
Shay Banon e908c723f1 don't log merge failures twice 2013-02-27 20:23:40 +01:00
Simon Willnauer 7be8f431d5 move id tests into SimpleQueryTests 2013-02-27 19:03:42 +01:00
Simon Willnauer 8ab602ec81 Fix AIOOB exception in UID type/id tuple creation.
Closes #2695
2013-02-27 18:58:27 +01:00
Shay Banon 3b2d403292 malformed elasticsearch.yml causes unresponsive hang
fixes #2693
2013-02-27 18:58:08 +01:00
Drew Raines cb7a569f4b Include preference in _count serialization and builder. [#2698] 2013-02-27 08:15:02 -06:00
Martijn van Groningen ffbdc0a4c3 Updated postings format jdocs 2013-02-27 10:46:55 +01:00
Drew Raines b53a8aff6a Allow _count to take preference parameter. [#2698] 2013-02-26 16:24:52 -06:00
Shay Banon 1e937fd5d1 Allow index: "no" for _type
fixes #2696
2013-02-26 22:06:52 +01:00
Martijn van Groningen 7c53d22ce9 Moved resolveClosestNestedObjectMapper to MapperService 2013-02-26 17:48:02 +01:00
Igor Motov de243493c9 Changing dynamic index and cluster settings should work on master-only nodes
Fixes #2675
2013-02-26 08:54:46 -05:00