Commit Graph

4053 Commits

Author SHA1 Message Date
Martijn van Groningen bf21466291 CacheTests test fix. 2013-04-12 19:14:38 +02:00
Martijn van Groningen 80dbca0809 Field data: Try to load short values as byte values and load int values as short or byte values to reduce the size they take in memory. 2013-04-12 19:11:18 +02:00
Martijn van Groningen 5c90e5f940 If no options are specified with the clear cache api then all caches should be cleared.
Closes #2886
2013-04-12 15:24:50 +02:00
Igor Motov 00c035f88c Make sure that settings are propagated to all nodes 2013-04-11 10:59:14 -04:00
Martijn van Groningen 2dfcc3c740 Test that size is actually computed.
Relates to #2882
2013-04-11 10:22:48 +02:00
Simon Willnauer 9a2d27a035 rename prefix_length to prefix_len for consistency
Closes #2883
2013-04-10 17:39:32 +02:00
Martijn van Groningen 6a3c53ef44 Should prevent OOM 2013-04-10 10:00:51 +02:00
Martijn van Groningen b8b28041e5 Fix for extended facets test. 2013-04-10 00:47:00 +02:00
Igor Motov b0e44a2b40 Fix term counters in script field terms facet
Fixes #2878
2013-04-09 12:42:35 -04:00
Simon Willnauer ae74a8dbb7 Configure FieldData using a hash not a string
Closes #2876
2013-04-09 15:53:05 +02:00
Simon Willnauer 374bbbfa7b # FieldData Filter
FieldData is an in-memory representation of the term dictionary in an uninverted form. Under certain circumstances this FieldData representation can grow very large on high-cardinality fields like tokenized full-text. Depending on the use-case filtering the terms that are hold in the FieldData representation can heavily improve execution performance and application stability.
FieldData Filters can be applied on a per-segment basis. During FieldData loading the terms enumeration is passed through a filter predicate that  either accepts or rejects a term.

## Frequency Filter

The Frequency Filter acts as a high / low pass filter based on the document frequencies of a certain term within the segment that is loaded into field data. It allows to reject terms that are very high or low frequent based on absolute frequencies or percentages relative to the number of documents in the segment or more precise the number of document that have at least one value in the field that is loaded in the current segment.

Here is an example mapping

Here is an example mapping:

```json
{
    "tweet" : {
        "properties" : {
            "locale" : {
                "type" : "string",
                "fielddata" : "format=paged_bytes;filter.frequency.min=0.001;filter.frequency.max=0.1",
                "index" : "analyzed",
            }
        }
    }
}
```
### Paramters

 * `filter.frequency.min` - the minimum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
 * `filter.frequency.max` - the maximum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
 * `filter.frequency.min_segment_size` - the minimum number of documents in a segment in order for the filter to be applied. Small segments might be omitted with this setting.

## Regular Expression Filter

The regular expression filter applies a regular expression to each term  during loading and only loads terms into memory that match the given regular expression.

Here is an example mapping:

```json
{
    "tweet" : {
        "properties" : {
            "locale" : {
                "type" : "string",
                "fielddata" : "format=paged_bytes;filter.regex=^en_.*",
                "index" : "analyzed",
            }
        }
    }
}
```

Closes #2874
2013-04-09 11:34:48 +02:00
Simon Willnauer a10c80e20f ensure that modificatons to the enum order trigger test failures since we rely on the ordinal 2013-04-08 23:29:56 +02:00
Simon Willnauer 7e77ddb88f use enum to represent flags and fail if flags are not respected 2013-04-08 22:56:11 +02:00
Shay Banon 15d7ae5983 FieldData Stats: Add field data stats to indices stats API
closes #2870
2013-04-07 18:30:24 -07:00
Martijn van Groningen 86c1714bf3 Also test the `fields` option. 2013-04-07 21:52:19 +02:00
Shay Banon 84670212a6 Filter / Id Cache Stats: Add to Indices Stats API, revise node stats API
closes #2862
2013-04-05 20:02:32 +02:00
Simon Willnauer 5e7ad9832c Added more evil tests for different field data implementations 2013-04-05 18:12:50 +02:00
Martijn van Groningen 224faffead Added an extended test for terms facet with a decent number of documents / field values and randomly tests various options. Also fixed an issue where `regex` and `excludes` were ignored when `all_terms` was used. 2013-04-05 17:38:46 +02:00
David Pilato 4b1ec037f8 Fix test for #2668. 2013-04-05 15:00:28 +02:00
Martijn van Groningen 9b5c74d43e Made sure `all_terms` works consistently. In some cases the `all_terms` option was ignored: * Faceting on number based fields. * The `execution_type` was set to `map`. * In the case the `fields` option was used.
Closes #2861
2013-04-05 14:27:19 +02:00
David Pilato 36b92be212 List of existing plugins with Node Info API
We want to display information about loaded plugins in Node Info API using plugin option:

```sh
curl http://localhost:9200/_nodes?plugin=true
```

For example, on a 4 nodes cluster, it could provide the following output:

```javascript
{
  "ok" : true,
  "cluster_name" : "test-cluster-MacBook-Air-de-David.local",
  "nodes" : {
    "lodYfbFTRnmwE6rjWGGyQQ" : {
      "name" : "node1",
      "transport_address" : "inet[/172.18.58.139:9300]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9200]",
      "plugins" : [ ]
    },
    "hJLXmY_NTrCytiIMbX4_1g" : {
      "name" : "node4",
      "transport_address" : "inet[/172.18.58.139:9303]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9203]",
      "plugins" : [ {
        "name" : "test-plugin",
        "description" : "test-plugin description",
        "site" : true,
        "jvm" : false
      }, {
        "name" : "test-no-version-plugin",
        "description" : "test-no-version-plugin description",
        "site" : true,
        "jvm" : false
      }, {
        "name" : "dummy",
        "description" : "No description found for dummy.",
        "url" : "/_plugin/dummy/",
        "site" : false,
        "jvm" : true
      } ]
    },
    "bnoySsBfTrSzbDRZ0BFHvg" : {
      "name" : "node2",
      "transport_address" : "inet[/172.18.58.139:9301]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9201]",
      "plugins" : [ {
        "name" : "dummy",
        "description" : "This is a description for a dummy test site plugin.",
        "url" : "/_plugin/dummy/",
        "site" : false,
        "jvm" : true
      } ]
    },
    "0Vwil01LSfK9YgRrMce3Ug" : {
      "name" : "node3",
      "transport_address" : "inet[/172.18.58.139:9302]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9202]",
      "plugins" : [ {
        "name" : "test-plugin",
        "description" : "test-plugin description",
        "site" : true,
        "jvm" : false
      } ]
    }
  }
}
```

Information are cached for 10 seconds by default. Modify `plugins.info_refresh_interval` property if needed.
Setting `plugins.info_refresh_interval` to `-1` will cause infinite caching.
Setting `plugins.info_refresh_interval` to `0` will disable caching.

Closes #2668.
2013-04-05 11:36:56 +02:00
Simon Willnauer f3e6fe094a beef up term facet tests 2013-04-05 11:05:24 +02:00
Simon Willnauer 9fbe075aec Added test that compares concurrent facet execution results with a serial execution result 2013-04-05 10:36:53 +02:00
Shay Banon 54f685674b Thread Pool: Update default settings (move from default cached to fixed)
closes #2858
2013-04-04 23:24:49 +02:00
Simon Willnauer d758401add Cleanup ScriptDocValues. This commit adds a getValues method to all ScriptDocValues for easy access
in scripts via doc['field'].values / value.
2013-04-04 16:07:54 +02:00
Alexander Reelsen 4f96b36376 Returning configuration of root field mappers toXContent method only if they are enabled 2013-04-04 15:55:12 +02:00
Alexander Reelsen 955788e9a5 Allowing to disable size field mapper after enabling 2013-04-04 09:41:41 +02:00
Alexander Reelsen e662e4d55d Allowing to disable index field mapper after enabling 2013-04-04 09:41:41 +02:00
Alexander Reelsen 9cc2563d5e Allowing to disable timestamp field mapper after enabling 2013-04-04 09:41:41 +02:00
Simon Willnauer 223ec2c42d Beef up FieldData tests by running one on one duells 2013-04-03 18:38:25 +02:00
Igor Motov 356329df00 Improve stability of ClusterHealthTests 2013-04-03 12:07:42 -04:00
Igor Motov d2f6349dcf Improve stability of MinimumMasterNodesTests 2013-04-03 11:51:28 -04:00
Martijn van Groningen 0a89c80554 Fixed issue where a doc is omitted from the hits if it has no geo point and sorting is based on geo distance.
Closes #2851
2013-04-03 17:25:16 +02:00
Simon Willnauer eb8b38d027 Upgrade to Lucene 4.2.1 2013-04-03 12:22:39 +02:00
Martijn van Groningen cf00acf5b0 If no specified index or alias exists and `ignore_indices` is set to `missing` an index missing error is returned instead of resolving to all open indices (e.g. when searching). This breaks backwards comp. with 0.20.x and before.
Closes #2837
2013-04-02 19:06:17 +02:00
Alexander Reelsen d866321c55 Merge pull request #2811 from spinscale/document-mapper-merge
Allow to update ttl field mapping after initial creation. Fixes #2136
2013-04-01 23:37:29 -07:00
Simon Willnauer 7efa92636a Cut over to IntsRef in favor of IntsArrayRef 2013-03-31 10:46:21 +02:00
Simon Willnauer b3356d9f8d remove dead code 2013-03-31 10:19:17 +02:00
Simon Willnauer 2a09342405 remove Bytes.java in favor of BytesRef / ArrayUtils 2013-03-31 08:54:39 +02:00
Simon Willnauer fefa8da2ea remove StringValues in favor of BytesValues 2013-03-30 17:35:23 +01:00
Simon Willnauer dff2a9279c clean-up double values 2013-03-30 17:35:23 +01:00
Simon Willnauer d5c271acf5 clean-up long values 2013-03-30 17:35:22 +01:00
Simon Willnauer 5aedf74fb0 Remove getValues from numeric and string field data & clean up geo field data 2013-03-30 17:35:22 +01:00
Simon Willnauer 7f81469137 Refactor BytesValues to be reused as the interface for HashedBytesValues and remove HashBytesValues. 2013-03-30 17:35:22 +01:00
Simon Willnauer 129f02623b Added FST based FieldData implementation holding all data in a per segment FST.
This commit factors our a common API for BytesValues based impl to shared code and reduce code duplication.
2013-03-30 17:35:22 +01:00
Martijn van Groningen a89dde8bac Fixed `bool` filter bugs:
* In the case only should clauses were specified with specific type of filters, the first clause determined which documents matched.
 * In some cases the minimum at least 1 should clause should match behaviour was broken.
2013-03-29 16:48:36 +01:00
Alexander Reelsen a880a6c85e Allow to update ttl field mapping after initial creation. Fixes #2136
Adding possibility to change TTL field mapper data without specifying enabled flag in mapping update
2013-03-28 17:25:28 +01:00
Martijn van Groningen 941aa17a43 Added sort mode to geo distance sorting. Closes #1846 2013-03-28 17:04:42 +01:00
Igor Motov 9bc50ea609 Fix LeastUsedDistributor and ensure random distribution for multiple non-fs directories
If we cannot determine available space the fallback scenario is now to use random distribution instead of always using the last directory.

Fixes #2820
2013-03-28 11:08:54 -04:00
Shay Banon 1fc37e5954 Segments API: Add version & compound for each segment
closes #2823
2013-03-28 15:34:38 +01:00
Igor Motov 5bb75f9da3 Move applying alias filter to ContextSearch#preProcess() 2013-03-27 09:23:54 -04:00
Simon Willnauer 17f83f33bb Terminate early when no terms left in the suggest string.
Closes #2817
2013-03-26 17:44:34 +01:00
Igor Motov 9ae421a8b2 Fix filtering aliases with non-empty sort options
Fixes #2816
2013-03-26 07:23:44 -04:00
Simon Willnauer aa97c031f2 Don't reset tokenstream before passing to the MemoryIndex, otherwise some tokenizer might swallow tokens.
Closes #2814
2013-03-25 22:46:11 +01:00
adavis 6a93fbcf07 Adding parsing for zero terms query for multi match
Tests for multi-match zero_terms_query and making references to the ZeroTermsQuery enum consistent to others used in MultiMatchQueryBuilder
2013-03-23 08:59:39 +01:00
adavis 3f83904680 Fixes java6_u31 compile error w.r.t. type inference 2013-03-22 16:46:42 -07:00
Florian Schilling 1a67793a4b Added Script test for geo distance tests and modified GeoUtils.normalizePoint() 2013-03-22 13:34:18 +01:00
Simon Willnauer 075779a397 Call onMissing if doc has no value in the field.
Closes #2807
2013-03-21 22:45:17 +01:00
Simon Willnauer 064d272916 Respect offset and length when iterating over BytesRef in Uid. The length is starting at offset
Closes #2806
2013-03-21 19:29:05 +01:00
Florian Schilling f08d458545 # GeoShape Precision
The `geo_shape` precision could be only set via `tree_levels` so far. A new option `precision` now allows to set the levels of the underlying tree structures to be set by distances like `50m`. The def

## Example
```json
curl -XPUT 'http://127.0.0.1:9200/myindex/' -d '{
  "mappings" : {
      "type1": {
          "dynamic": "false",
          "properties": {
              "location" : {
                  "type" : "geo_shape",
                  "geohash" : "true",
                  "store" : "yes",
                  "precision":"50m"
              }
          }
      }
  }
}'
```

## Changes
- GeoUtils defines the [WGS84](http://en.wikipedia.org/wiki/WGS84) reference ellipsoid of earth
- DistanceUnits refer to a more precise definition of earth circumference
- DistanceUnits for inch, yard and meter have been defined
- Set default levels in GeoShapeFieldMapper to 50m precision

Closes #2803
2013-03-20 14:52:47 +01:00
Simon Willnauer 4705eb2959 Lazily initialize the delegate in BloomFilteredPostingsFormat to prevent unnecessary loading if bloomfilter terminates early 2013-03-20 12:43:17 +01:00
Simon Willnauer 747ce36915 Specialise the default codec to reuse Lucene41 files in the common case.
Closes #2799
2013-03-20 12:43:17 +01:00
Shay Banon 7d9cef904b Field Data: optimize long type to use narrowest possible type automatically
closes #2795
2013-03-18 12:37:15 +01:00
Simon Willnauer c25eb7defe Fix bug in RateLimiter.SimpleRateLimiter causing numeric overflow in StoreStats
Closes #2785
2013-03-15 23:36:31 +01:00
Florian Schilling 25bd9cecd0 # REST Suggester API
The REST Suggester API binds the 'Suggest API' to the REST Layer directly. Hence there is no need to touch the query layer for requesting suggestions.
This API extracts the Phrase Suggester API and makes 'suggestion request' top-level objects in suggestion requests. The complete API can be found in the
underlying ["Suggest Feature API"](http://www.elasticsearch.org/guide/reference/api/search/suggest.html).

# API Example
The following examples show how Suggest Actions work on the REST layer. According to this a simple request and its response will be shown.

## Suggestion Request
```json
curl -XPOST 'localhost:9200/_suggest?pretty=true' -d '{
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
        "phrase" : {
            "analyzer" : "bigram",
            "field" : "bigram",
            "size" : 1,
            "real_word_error_likelihood" : 0.95,
            "max_errors" : 0.5,
            "gram_size" : 2
        }
    }
}'
```
This example shows how to query a suggestion for the global text 'Xor the Got-Jewel'. A 'simple phrase' suggestion is requested and
a 'direct generator' is configured to generate the candidates.

## Suggestion Response
On success the request above will reply with a response like the following:
```json
{
    "simple_phrase" : [ {
        "text" : "Xor the Got-Jewel",
        "offset" : 0,
        "length" : 17,
        "options" : [ {
            "text" : "xorr the the got got jewel",
            "score" : 3.5283546E-4
        } ]
    } ]
}
```
The 'suggest'-response contains a single 'simple phrase' which contains an 'option' in turn. This option represents a suggestion of the
queried text. It contains the corrected text and a score indicating the probability of this option to be meant.

Closes #2774
2013-03-13 19:36:29 +01:00
Alexander Reelsen 125b33d3dc GeoJSONShapeParser parses JSON correctly and extracts coordinates even if 'crs' field is included.
Fixes #2763
2013-03-13 15:17:21 +01:00
Simon Willnauer 365cde82d3 Use numOrds rather than numDocs as upperbound for sorting
Closes #2773
2013-03-13 15:13:56 +01:00
Igor Motov 3a534c64e5 Add dynamic settings validation
Fixes #2749
2013-03-12 14:41:00 -04:00
Simon Willnauer c5395436e6 fix test bug where a small time window exists that can trigger a false failure due to default concurrent recoveries 2013-03-12 14:48:14 +01:00
Simon Willnauer 237c4ddf54 Introdue ParentIdCollector that collects only if the parent ID is non null ie. if the document has a parent.
Closes #2744
2013-03-11 21:11:05 +01:00
Simon Willnauer 9442c41481 enable testcase that relied on a Lucene 4.2 fix 2013-03-11 12:57:24 +01:00
Simon Willnauer 4e7cff488e add test that ensures that we bumb the version on a Lucene Upgrade 2013-03-11 10:30:48 +01:00
Simon Willnauer ebadd9ebbd Fix tests since Lucene 4.2 we can support date math in Fuzzy-Search Syntax 2013-03-11 08:23:01 +01:00
Simon Willnauer a37f1f55cc Add tests for highlighting boost query.
Closes #1314
2013-03-11 08:23:01 +01:00
Simon Willnauer 11bf7a8b1a Upgrade to Lucene 4.2 2013-03-11 08:23:01 +01:00
Simon Willnauer 75fd6d4985 Added KeywordRepeatFilter that allows emitting stemmed and unstemmed versions of the same token if stemmers are used
Closes #2753
2013-03-09 23:09:59 +01:00
Simon Willnauer dc9a052287 Respect CandidateGenerator#size if set in the request and reduce the total #of candidates to the shard size.
Closes #2752
2013-03-09 13:36:40 +01:00
Shay Banon eb956e7c09 Term/Terms filters on numeric fields gives wrong result
fixes #2746
2013-03-07 22:12:22 -08:00
Benjamin Devèze 35f5ca915d Add support for ignore_indices to delete by query
Closes #2734
2013-03-07 10:17:51 +01:00
Shay Banon e1409a9f0e Problems with range searches for time with lte
fixes #2731
2013-03-05 18:10:30 -08:00
Igor Motov acff102234 Implement search shards API
Closes #2726
2013-03-05 09:17:59 -05:00
Simon Willnauer 1eb24d7efc use a base ShingleFilterFactory to simplify default shingle detection 2013-03-05 12:32:50 +01:00
Simon Willnauer 876b5a3dcd prefer totalTermFrequency over docFreq in PhraseSuggester 2013-03-05 10:46:25 +01:00
Simon Willnauer 315744be55 Set shardSize according to the total size if not explicitly specified. Closes #2729 2013-03-05 09:22:23 +01:00
Shay Banon 3e264f6b95 cleanup deletion of content in shards
we are very conservative on when we delete data, remove the actual options of deleting data that are not used
2013-03-04 20:41:19 -08:00
Drew Raines a8d52b58b6 Remove obsolete test. 2013-03-04 15:22:40 -06:00
Benjamin Devèze 09f20e3d4c Fix bug when searching concrete and routing aliased indices
Closes #2683
2013-03-03 14:31:57 +01:00
uboness 881cb7900c Change geo_shapes support:
* Exposed the spatial strategy to be configurable as part of the geo_shape mappings
* Exposed the spatial strategy to be customizable at query time (will be used to generate the geo_shape filter/query)
* Removed XTermQueryPrefixTreeStrategy and reverted to use the lucene TermQueryPrefixTreeStrategy instead
* Made the RecursivePrefixTreeStrategy the default strategy to be used
* Removed support for all spatial operations except "intersects"
* Updated both the GeoShapeQueryBuilder and GeoShapeFilterBuilder with all the changes (removed the option of specifying the operation type (as only intersects is supported) and added the option of setting the filter/query spatial strategy

Closes #2720
2013-03-02 17:13:58 +01:00
Shay Banon 50d121315b add ability for cluster health to wait for current events to be processed
help with tests that run on slow machines
2013-03-02 14:25:45 +01:00
Shay Banon ea097afd91 add proper testing for bool filter 2013-03-02 01:07:05 +01:00
Shay Banon 361d6bf89a spin a bit to wait for condition in test, so slow machines will still run it correctly 2013-03-01 23:36:13 +01:00
Shay Banon 9b68e98ea2 more strict check before trying to parse and detect a string as a date
fixes #2694
2013-03-01 22:15:32 +01:00
Simon Willnauer aaa3c48b3c Throw IAE if indices is null or contains a null value.
Closes #2656
2013-03-01 21:26:23 +01:00
Simon Willnauer fced68c22d ensure that suggestion only added on reduce if they are present in the shard response 2013-03-01 21:09:10 +01:00
Martijn van Groningen d99b532f0f Supporting sort modes `avg` and `sum` when sorting inside nested objects.
Previously this commit either sort modes `min` or `max` (depending on sort order) was used when sort modes `avg` and `sum` were picked.

Closes #2701
2013-03-01 19:53:20 +01:00
Simon Willnauer 39f362326e Short Curcuit response if no indices exits and make sure listener is notified.
Closes #2692
2013-03-01 15:15:56 +01:00
Simon Willnauer 3c1f291801 Fail in metadata parsing if the id path is not a value but rather an array or an object.
Closes #2275
2013-03-01 13:00:29 +01:00
Shay Banon 30075bb6f9 add info in test for actual search failures 2013-03-01 00:00:09 +01:00
Shay Banon 849a3677cd improve timing in test to wait for state with graceful timeouts
(yet, validate early and exit when relevant)
2013-02-28 23:44:52 +01:00
Simon Willnauer c90c5cbf85 fix bug in StupidBackoffScorer were previous word and current word were flipped creating non-existing bigram 2013-02-28 21:23:41 +01:00
Simon Willnauer d4ec03ed76 # Phrase Suggester
The `term` suggester provides a very convenient API to access word alternatives on token
basis within a certain string distance. The API allows accessing each token in the stream
individually while suggest-selection is left to the API consumer. Yet, often already ranked
/ selected suggestions are required in order to present to the end-user.
Inside ElasticSearch we have the ability to access way more statistics and information quickly
to make better decision which token alternative to pick or if to pick an alternative at all.

This `phrase` suggester adds some logic on top of the `term` suggester to select entire
corrected phrases instead of individual tokens weighted based on a *ngram-langugage models*. In practice it
will be able to make better decision about which tokens to pick based on co-occurence and frequencies.
The current implementation is kept quite general and leaves room for future improvements.

# API Example

The `phrase` request is defined along side the query part in the json request:

```json
curl -s -XPOST 'localhost:9200/_search' -d {
  "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 1,
        "real_word_error_likelihood" : 0.95,
        "max_errors" : 0.5,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "body",
          "suggest_mode" : "always",
          "min_word_len" : 1
        } ]
      }
    }
  }
}
```

The response contains suggested sored by the most likely spell correction first. In this case we got the expected correction
`xorr the god jewel` first while the second correction is less conservative where only one of the errors is corrected. Note, the request
is executed with `max_errors` set to `0.5` so 50% of the terms can contain misspellings (See parameter descriptions below).

```json
  {
  "took" : 37,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2938,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "simple_phrase" : [ {
      "text" : "Xor the Got-Jewel",
      "offset" : 0,
      "length" : 17,
      "options" : [ {
        "text" : "xorr the god jewel",
        "score" : 0.17877324
      }, {
        "text" : "xor the god jewel",
        "score" : 0.14231323
      } ]
    } ]
  }
}
````

# Phrase suggest API

## Basic parameters

* `field` - the name of the field used to do n-gram lookups for the language model, the suggester will use this field to gain statistics to score corrections.
* `gram_size` - sets max size of the n-grams (shingles) in the `field`. If the field doesn't contain n-grams (shingles) this should be omitted or set to `1`.
* `real_word_error_likelihood` - the likelihood of a term being a misspelled even if the term exists in the dictionary. The default it `0.95` corresponding to 5% or the real words are misspelled.
* `confidence` - The confidence level defines a factor applied to the input phrases score which is used as a threshold for other suggest candidates. Only candidates that score higher than the threshold will be included in the result. For instance a confidence level of `1.0` will only return suggestions that score higher than the input phrase. If set to `0.0` the top N candidates are returned. The default is `1.0`.
* `max_errors` - the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range `[0..1)` as a fraction of the actual query terms a number `>=1` as an absolut number of query terms. The default is set to `1.0` which corresponds to that only corrections with at most 1 misspelled term are returned.
* `separator` - the separator that is used to separate terms in the bigram field. If not set the whitespce character is used as a separator.
* `size` - the number of candidates that are generated for each individual query term Low numbers like `3` or `5` typically produce good results. Raising this can bring up terms with higher edit distances. The default is `5`.
* `analyzer` -  Sets the analyzer to analyse to suggest text with. Defaults to the search analyzer of the suggest field passed via `field`.
* `shard_size` - Sets the maximum number of suggested term to be retrieved from each individual shard. During the reduce phase the only the top N suggestions are returned based on the `size` option. Defaults to `5`.
* `text` - Sets the text / query to provide suggestions for.

## Smoothing Models
The `phrase` suggester supports multiple smoothing models to balance weight between infrequent grams (grams (shingles) are not existing in the index) and frequent grams (appear at least once in the index).
* `laplace` - the default model that uses an additive smoothing model where a constant (typically `1.0` or smaller) is added to all counts to balance weights, The default `alpha` is `0.5`.
* `stupid_backoff` - a simple backoff model that backs off to lower order n-gram models if the higher order count is `0` and discounts the lower order n-gram model by a constant factor. The default `discount` is `0.4`.
* `linear_interpolation` - a smoothing model that takes the weighted mean of the unigrams, bigrams and trigrams based on user supplied weights (lambdas). Linear Interpolation doesn't have any default values. All parameters (`trigram_lambda`, `bigram_lambda`, `unigram_lambda`) must be supplied.

## Candidate Generators
The `phrase` suggester uses candidate generators to produce a list of possible terms per term in the given text. A single candidate generator is similar to a `term` suggester called for each individual term in the text. The output of the generators is subsequently scored in in combination with the candidates from the other terms to for suggestion candidates.
Currently only one type of candidate generator is supported, the `direct_generator`. The Phrase suggest API accepts a list of generators under the key `direct_generator` each of the generators in the list are called per term in the original text.

## Direct Generators

The direct generators support the following parameters:

* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
 * `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
 * `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
 * `always` - Suggest any matching suggestions based on terms in the suggest text.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.
* pre_filter -  a filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. (optional)
* post_filter - a filter (analyzer) that is applied to each of the generated tokens before they are passed to the actual phrase scorer. (optional)

The following example shows a `phrase` suggest call with two generators, the first one is using a field containing ordinary indexed terms and the second one uses a field that uses
terms indexed with a `reverse` filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The `pre_filter` and `post_filter` options accept ordinary analyzer names.

```json
curl -s -XPOST 'localhost:9200/_search' -d {
 "suggest" : {
    "text" : "Xor the Got-Jewel",
    "simple_phrase" : {
      "phrase" : {
        "analyzer" : "body",
        "field" : "bigram",
        "size" : 4,
        "real_word_error_likelihood" : 0.95,
        "confidence" : 2.0,
        "gram_size" : 2,
        "direct_generator" : [ {
          "field" : "body",
          "suggest_mode" : "always",
          "min_word_len" : 1
        }, {
          "field" : "reverse",
          "suggest_mode" : "always",
          "min_word_len" : 1,
          "pre_filter" : "reverse",
          "post_filter" : "reverse"
        } ]
      }
    }
  }
}
```

`pre_filter` and `post_filter` can also be used to inject synonyms after candidates are generated. For instance for the query `captain usq` we might generate a candidate `usa` for term `usq` which is a synonym for `america` which allows to present `captain america` to the user if this phrase scores high enough.

Closes #2709
2013-02-28 16:17:59 +01:00
Shay Banon 7400c30eba fail a shard if a merge failure occurs 2013-02-27 23:44:55 +01:00
Simon Willnauer 7be8f431d5 move id tests into SimpleQueryTests 2013-02-27 19:03:42 +01:00
Simon Willnauer 8ab602ec81 Fix AIOOB exception in UID type/id tuple creation.
Closes #2695
2013-02-27 18:58:27 +01:00
Martijn van Groningen 2b5e3f5586 Fixed resolving closest nested object when sorting on a field inside nested object 2013-02-25 16:21:22 +01:00
Shay Banon bde36647fb Terms/Ids filter: Support empty list of values, resulting in no match for it
closes #2687
also closes #2686
2013-02-25 12:26:49 +01:00
Shay Banon 4145d154bb add a test for empty lookup terms filter 2013-02-25 11:58:58 +01:00
Shay Banon 595e0e254e [Code refactoring] IndicesStats -> IndicesStatsResponse
fixes #1782
2013-02-23 14:23:36 +01:00
David Pilato 4c493ac71d Revert changes on *Request classes from issue
Relative to #2657
2013-02-23 10:37:56 +01:00
David Pilato a646e126e9 Display list of all available site plugins on /_plugins/ end point fix #2664 2013-02-23 09:34:06 +01:00
Igor Motov b8cc8e56c4 Improve stability of SimpleRobinEngineTests 2013-02-22 14:59:49 -05:00
Shay Banon 03fdc6aa80 Query DSL: Terms filter to allow for terms lookup from another document
closes #2674
2013-02-22 14:04:10 +01:00
Igor Motov ec3492c67c Improve stability of the testReusePeerRecovery test 2013-02-21 16:06:33 -05:00
Igor Motov ce6f0e27bf Make file distribution among several disks configurable
Fixes #2650
2013-02-19 21:43:43 -05:00
David Pilato b7afa0f44e Fix test for Support trailing slashes on plugin _site URLs #2654 2013-02-19 21:16:47 +01:00
Martijn van Groningen 3b31c1216e Made the `term_vector` json field the leading way of configuring term vectors. Supported options: `no`, `yes`, `with_offsets`, `with_positions`, `with_positions_offsets` and`with_positions_offsets_payloads`. 2013-02-19 20:55:43 +01:00
Igor Motov 5b9e9a004a Make sure that in SitePluginTests http client connects to the correct node and closes the node after the test 2013-02-19 14:42:24 -05:00
Igor Motov d126558dec Add check for health timeout to shardCleanup test 2013-02-19 13:12:26 -05:00
David Pilato 8ab9d2dd1f Support trailing slashes on plugin _site URLs fix #2654 2013-02-19 09:21:45 +01:00
Igor Motov cfaa859bb2 Improve stability of UpdateNumberOfReplicasTests 2013-02-18 20:12:39 -05:00
Igor Motov 5746c50ef9 Improve stability of shardsCleanup test 2013-02-18 19:35:12 -05:00
Igor Motov 183a74c866 Improve stability of testSimpleAwareness test 2013-02-18 19:31:07 -05:00
Martijn van Groningen 303e87fb69 Added support for sorting by fields inside one or more nested objects.
The sorting by nested field support has the following parameters on top of the already existing sort options:

nested_path - Defines the on what nested object to sort. The actual sort field must be a direct field inside this nested object. The default is to use the most immediate inherited nested object from the sort field.
nested_filter - A filter the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting. Common case is to repeat the query / filter inside the nested filter or query. By default no nested_filter is active.
Either the highest (max) or lowest (min) inner object is picked for during sorting depending on the sort_mode being used. The sort_mode options avg and sum can still be used for number based fields inside nested objects. All the values for the sort field are taken into account for each nested object.

Closes #2662
2013-02-18 22:10:41 +01:00
Simon Willnauer 8db436f107 Remove backported Lucene 4 spatial code in favor of the released version in Lucene 4.1 2013-02-18 18:43:55 +01:00
Jeffrey Gerard 0dfc2169d7 Added Testcase and BugFix fixing #2626 where GeoShape intersects filter omitted matching docs.
SpatialPrefixTree#recursiveGetNodes uses an optimization that prevents
recursion into the deepest tree level if a parent node in the penultimate
level covers all its children.  This produces a bug if the optimization
happens both at indexing and at query/filter time.

This patch fixes the bug by disabling the optimization at indexing time
(to avoid adding overhead for query-heavy workloads).

See LUCENE-4770 for reference
2013-02-18 18:43:47 +01:00
David Pilato cc83c2f848 refactoring getter/setters
Fixes #2657
2013-02-18 11:09:32 -05:00
Martijn van Groningen ac2e6a3a4d Fixed nested facets with filters. 2013-02-18 11:01:18 -05:00
Simon Willnauer 24291d40f4 Expose CJKWidthTokenFilter and CJKBigramTokenFilter
Closes #2660
2013-02-18 11:01:17 -05:00
Shay Banon 547bd7abf2 add our own bloom filter implementation
uses more hash iterations, yet require less memory for the same fpp
relates to #2411
2013-02-18 11:01:17 -05:00
Shay Banon 73a447da86 initial facet refactoring
the main goal of the facet refactoring is to allow for two modes of facet execution, collector based, that get callbacks as hist match, and post based, which iterates over all the relevant hits
it also includes a some simplification of the facet implementation
2013-02-16 02:25:04 +01:00
Shay Banon 06b82a45d4 Simplified range syntax when using a query string
closes #2655
2013-02-15 01:30:55 +01:00
Shay Banon 4714a6acc9 Clear cache: allow to invalidate specific filter cache keys
closes #2653
2013-02-14 21:13:19 +01:00
Igor Motov 37f16127c5 Fix ScriptFilter cache key calculation
Fixes #2651
2013-02-14 06:13:26 -05:00
Shay Banon f41eccc7a5 updating non dynamic settings throws an error now 2013-02-13 14:28:16 +01:00
Shay Banon 5519f80abb add increased timeout waiting for relocation when running on small boxes 2013-02-12 21:23:18 +01:00
Martijn van Groningen fc13499ff5 Added `sort_mode` option that defines what value to pick in the case the sort field is multi-valued.
The `min` and `max` sort modes are supported for all field types. Either the lowest value or the highest value is picked. In addition to that number based fields also support `sum` and `avg` as sort mode. If `sum` sort mode is used then all the values for a field and belonging to a document are added together and the result of that is used as sort value. If the `avg` sort mode is used then the average of all values for the sort field belonging to that document is used as sort value.

Relates to #2634
2013-02-12 20:38:24 +01:00
Shay Banon 7d13545e33 delete indices before running the tests 2013-02-12 19:28:48 +01:00
Shay Banon 668bcd0eb7 Bulk execution while a shard is replication might send erroneous version conflict failures for certain items
fixes #2642
2013-02-12 17:38:06 +01:00
Simon Willnauer a7bbab7e87 # Rescore Feature
The rescore feature allows te rescore a document returned by a query based
on a secondary algorithm. Rescoring is commonly used if a scoring algorithm
is too costly to be executed across the entire document set but efficient enough
to be executed on the Top-K documents scored by a faster retrieval method. Rescoring
can help to improve precision by reordering a larger Top-K window than actually
returned to the user. Typically is it executed on a window between 100 and 500 documents
while the actual result window requested by the user remains the same.

# Query Rescorer

The `query` rescorer executes a secondary query only on the Top-K results of the actual
user query and rescores the documents based on a linear combination of the user query's score
and the score of the `rescore_query`. This allows to execute any exposed query as a
`rescore_query` and supports a `query_weight` as well as a `rescore_query_weight` to weight the
factors of the linear combination.

# Rescore API

The `rescore` request is defined along side the query part in the json request:

```json
curl -s -XPOST 'localhost:9200/_search' -d {
  "query" : {
    "match" : {
      "field1" : {
        "query" : "the quick brown",
        "type" : "boolean",
        "operator" : "OR"
      }
    }
  },
  "rescore" : {
    "window_size" : 50,
    "query" : {
      "rescore_query" : {
        "match" : {
          "field1" : {
            "query" : "the quick brown",
            "type" : "phrase",
            "slop" : 2
          }
        }
      },
      "query_weight" : 0.7,
      "rescore_query_weight" : 1.2
    }
  }
}
```

Each `rescore` request is executed on a per-shard basis within the same roundtrip. Currently the rescore API
has only one implementation (the `query` rescorer) which modifies the result set in-place. Future developments
could include dedicated rescore results if needed by the implemenation ie. a pair-wise reranker.
*Note:* Only regualr queries are rescored, if the search type is set to `scan` or `count` rescorers are not executed.

Closes #2640
2013-02-12 17:10:00 +01:00
Shay Banon c65aff7775 Index with no replicas might loose on going documents while relocating a shard
fixes #26421
2013-02-12 17:03:28 +01:00
uboness a2b87e28f6 fixed a bug in PrioritizedThreadPoolExecutor:
now execute(Runnable) satisfies the priority and fifo nature of same-priority runnables
2013-02-09 04:20:16 +01:00
uboness 6d9048f8cc added priority support for cluster state updates:
* URGENT:
    * cluster_reroute (api)
    * refresh-mapping
    * cluster_update_settings
    * reroute_after_cluster_update_settings
    * create-index
    * delete-index
    * index-aliases
    * remove-index-template
    * create-index-template
    * update-mapping
    * remove-mapping
    * put-mapping
    * open-index
    * close-index
    * update-settings

* HIGH
    * routing-table-updater
    * zen-disco-node_left
    * zen-disco-master_failed
    * shard-failed
    * shard-started

* NORMAL
    * all other actions
2013-02-09 01:14:57 +01:00
Simon Willnauer f5331c9535 Cleanup NumericFieldData. FieldData interfaces are reduced to long and double while internal
represenations still operate on the actual datatypes.
2013-02-08 20:58:36 +01:00
Martijn van Groningen 1189a2c2c2 Extended mv sorting integration test 2013-02-08 15:24:56 +01:00
Martijn van Groningen 8c7779057c Added sort by field that have multiple values per document.
Closes #2634
2013-02-08 13:28:40 +01:00
Martijn van Groningen f97021b165 Fixes size assertion failure. 2013-02-07 16:50:54 +01:00
Martijn van Groningen e72e323c8a Attempt to fix "No active shards" failure 2013-02-07 10:14:10 +01:00
Lee Hinman ed43ad07d7 Throw a more meaningful message when no document is specified for indexing 2013-02-06 22:33:02 +01:00
Florian Schilling a52e01f3e5 Remove XTermsFilter and UidFilter in favour of Lucene 4.1 TermsFilter 2013-02-06 18:45:05 +01:00
Igor Motov 6890c9fa62 Move action.wait_on_mapping_change setting to pom 2013-02-06 11:48:58 -05:00
Igor Motov ed09ba0a18 Improve stability of RecoveryPercolatorTests
Without "action.wait_on_mapping_change" setting set to true, the test node might get shutdown before updated mapping is saved.
2013-02-05 14:53:46 -05:00
Igor Motov 8277833f8d Fix settings processing in WordDelimiterTokenFilterFactory 2013-02-05 10:03:00 -05:00
Martijn van Groningen 19295280d9 Made sure that wrapped child query / parent query gets rewritten only once. 2013-02-05 10:27:31 +01:00
Igor Motov 9e89323ad2 Add proper cleanup to InternalSettingsPerparerTests 2013-02-04 19:58:40 -05:00
Martijn van Groningen 8109d13733 Use CacheRecycler when resolving parent docs in TopChildrenQuery. 2013-02-04 12:46:30 +01:00
Martijn van Groningen 9c3a86875b Removed `execution_type` for has_child and has_parent. 2013-02-04 11:37:40 +01:00
Shay Banon a8c9e580ed add getMaxOrd, and properly document the difference between it and numOrds 2013-02-01 16:13:13 +01:00
Igor Motov 45b2bff8da Improve SearchStatsTests
Added refresh to guarantee that at least something will be fetched on a fast computer.
2013-01-31 21:19:08 -05:00
Igor Motov 3c9541dd14 Make facet and sort tests more reliable in case of multiple nodes and shards
Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.
2013-01-31 21:19:07 -05:00
Igor Motov 6a01e7882c Improve shardsCleanup test
When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.
2013-01-31 21:18:14 -05:00
Igor Motov e32efba3d8 Improve RecoverAfterNodes tests 2013-01-31 20:05:55 -05:00
Simon Willnauer 1a1df06411 Move OrdsBuilding into a dedicated class and abstract integer pools used to build sparse ordinals 2013-01-31 19:02:31 +01:00
Martijn van Groningen 46dd42920c Remove scope support in query and facet dsl.
Remove support for the `scope` field in facets and `_scope` field in the nested and parent/child queries. The scope support for nested queries will be replaced by the `nested` facet option and a facet filter with a nested filter. The nested filters will now support the a `join` option. Which controls whether to perform the block join. By default this enabled, but when disabled it returns the nested documents as hits instead of the joined root document.

Search request with the current scope support.
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
    "query" : {
		"nested" : {
			"path" : "offers",
			"query" : {
				"match" : {
					"offers.color" : "blue"
				}
			},
			"_scope" : "my_scope"
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "offers.size"
			},
			"scope" : "my_scope"
		}
	}
}'
```

The following will be functional equivalent of using the scope support:
```
curl -s -XPOST 'localhost:9200/products/_search?search_type=count' -d '{
    "query" : {
		"nested" : {
			"path" : "offers",
			"query" : {
				"match" : {
					"offers.color" : "blue"
				}
			}
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "offers.size"
			},
			"facet_filter" : {
				"nested" : {
					"path" : "offers",
					"query" : {
						"match" : {
							"offers.color" : "blue"
						}
					},
					"join" : false
				}
			},
			"nested" : "offers"
		}
	}
}'
```

The scope support for parent/child queries will be replaced by running the child query as filter in a global facet.

Search request with the current scope support:
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
	"query" : {
		"has_child" : {
			"type" : "offer",
			"query" : {
				"match" : {
					"color" : "blue"
				}
			},
			"_scope" : "my_scope"
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "size"
			},
			"scope" : "my_scope"
		}
	}
}'
```

The following is the functional equivalent of using the scope support with parent/child queries:
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
	"query" : {
		"has_child" : {
			"type" : "offer",
			"query" : {
				"match" : {
					"color" : "blue"
				}
			}
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "size"
			},
			"global" : true,
			"facet_filter" : {
				"term" : {
					"color" : "blue"
				}
			}
		}
	}
}'
```

Closes #2606
2013-01-31 15:09:57 +01:00
Martijn van Groningen 355381962b Use only the 'test' index, instead of all indices for child search benchmark. 2013-01-31 13:12:33 +01:00
Shay Banon 6cec73c201 remove fuzzy factor from mapping (internally implemented)
we want to support ~ notion in query parser for types other than strings, we are getting there, one can do now age:10~5, we would love to support it for dates, as in timestamp:2012-10-10~5d, but that requires changes in the query parser to support strings after the ~ sign
2013-01-31 12:23:03 +01:00
Igor Motov 8df7f2af0d Improve testReusePeerRecovery test 2013-01-30 19:51:41 -05:00
Igor Motov 29f4274213 Add index cleanup if index creation fails
Fixes #2590
2013-01-30 10:40:01 -05:00
Martijn van Groningen bc20f068c9 Made `search_analyzer` updateable via put mapping api.
Closes #2604
2013-01-30 11:49:20 +01:00
Simon Willnauer 5df37eaf75 add more advanced tests for phrase_prefix 2013-01-30 10:51:05 +01:00
Simon Willnauer 0697e2f23e use index prefix in tests to prevent misconfiguration 2013-01-28 15:51:06 +01:00
Simon Willnauer 72a2416a8c Support MultiPhrasePrefixQuery and MultiPhraseQuery in highlighters
Closes #2596
2013-01-28 15:41:25 +01:00
Martijn van Groningen 2e68207d6d Updated suggest api.
# Suggest feature
The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available from version `0.21.0`.

# Fuzzy suggester
The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request.

# Suggest API
The suggest request part is defined along side the query part as top field in the json request.

```
curl -s -XPOST 'localhost:9200/_search' -d '{
  "query" : {
    ...
  },
  "suggest" : {
    ...
  }
}'
```

Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. Both `my-suggest-1` and `my-suggest-2` suggestions use the `fuzzy` suggester, but have a different `text`.

```
"suggest" : {
  "my-suggest-1" : {
    "text" : "the amsterdma meetpu",
    "fuzzy" : {
      "field" : "body"
    }
  },
  "my-suggest-2" : {
    "text" : "the rottredam meetpu",
    "fuzzy" : {
      "field" : "title",
    }
  }
}
```

The below suggest response example includes the suggestion response for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains entries. Each entry is effectively a token from the suggest text and contains the suggestion entry text, the original start offset and length in the suggest text and if found an arbitary number of options.

```
{
  ...
  "suggest": {
    "my-suggest-1": [
      {
        "text" : "amsterdma",
        "offset": 4,
        "length": 9,
        "options": [
           ...
        ]
      },
      ...
    ],
    "my-suggest-2" : [
      ...
    ]
  }
  ...
}
```

Each options array contains a option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance.

```
"options": [
  {
    "text": "amsterdam",
    "freq": 77,
    "score": 0.8888889
  },
  ...
]
```

# Global suggest text

To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is defined globally and applies to the `my-suggest-1` and `my-suggest-2` suggestions.

```
"suggest" : {
  "text" : "the amsterdma meetpu"
  "my-suggest-1" : {
    "fuzzy" : {
      "field" : "title"
    }
  },
  "my-suggest-2" : {
    "fuzzy" : {
      "field" : "body"
    }
  }
}
```

The suggest text can in the above example also be specied as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.

# Other suggest example.

In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase.

```
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
    "my-title-suggestions-1" : {
      "text" : "devloping distibutd saerch engies",
      "fuzzy" : {
        "size" : 3,
        "field" : "title"
      }
    }
  }
}'
```

The above request could yield the response as stated in the code example below. As you can see if we take the first suggested options of each suggestion entry we get `developing distributed search engines` as result.

```
{
  ...
  "suggest": {
    "my-title-suggestions-1": [
      {
        "text": "devloping",
        "offset": 0,
        "length": 9,
        "options": [
          {
            "text": "developing",
            "freq": 77,
            "score": 0.8888889
          },
          {
            "text": "deloping",
            "freq": 1,
            "score": 0.875
          },
          {
            "text": "deploying",
            "freq": 2,
            "score": 0.7777778
          }
        ]
      },
      {
        "text": "distibutd",
        "offset": 10,
        "length": 9,
        "options": [
          {
            "text": "distributed",
            "freq": 217,
            "score": 0.7777778
          },
          {
            "text": "disributed",
            "freq": 1,
            "score": 0.7777778
          },
          {
            "text": "distribute",
            "freq": 1,
            "score": 0.7777778
          }
        ]
      },
      {
        "text": "saerch",
        "offset": 20,
        "length": 6,
        "options": [
          {
            "text": "search",
            "freq": 1038,
            "score": 0.8333333
          },
          {
            "text": "smerch",
            "freq": 3,
            "score": 0.8333333
          },
          {
            "text": "serch",
            "freq": 2,
            "score": 0.8
          }
        ]
      },
      {
        "text": "engies",
        "offset": 27,
        "length": 6,
        "options": [
          {
            "text": "engines",
            "freq": 568,
            "score": 0.8333333
          },
          {
            "text": "engles",
            "freq": 3,
            "score": 0.8333333
          },
          {
            "text": "eggies",
            "freq": 1,
            "score": 0.8333333
          }
        ]
      }
    ]
  }
  ...
}
```

# Common suggest options:
* `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.

# Common fuzzy suggest options
* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value:
** `score` - Sort by sore first, then document frequency and then the term itself.
** `frequency` - Sort by document frequency first, then simlarity score and then the term itself.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
** `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
** `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
** `always` - Suggest any matching suggestions based on terms in the suggest text.

# Other fuzzy suggest options:
* `lowercase_terms` - Lower cases the suggest text terms after text analyzation.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.
2013-01-28 15:18:18 +01:00
Simon Willnauer 48488f707f Expose CommonTermsQuery in Match & MultiMatch and enable highlighting
Closes #2591
2013-01-28 11:57:05 +01:00
Simon Willnauer 5c89d66216 move ShardsAllocatorModuleTests to o.e.t.integration 2013-01-25 22:26:30 +01:00
Shay Banon 042a5d02d9 Primary shard failure with initializing replica shards can cause the replica shard to cause allocation failures
fixes #2592
2013-01-25 17:59:01 +01:00
Shay Banon 990acff4f7 make sure we wait for yellow stats in suggest API when searching on clean index 2013-01-24 22:31:51 +01:00
Martijn van Groningen 9013eeae8a Added filter support in the `has_child` and `has_parent` filters.
Example:
```
curl -XPOST 'localhost:9200/_search' -d '{
  "query": {
    "filtered_query": {
      "query": {
        "match": {
          "title": "distributed systems"
        }
      },
      "filter": {
        "has_child": {
          "type": "tag",
          "filter": {
            "term": {
              "name": "book"
            }
          }
        }
      }
    }
  }
}'
```

Closes #2585
2013-01-24 21:32:38 +01:00
Martijn van Groningen 98a674fc6e Added suggest api.
# Suggest feature
The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available since version `0.21.0`.

# Fuzzy suggester
The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request.

# Suggest API
The suggest request part is defined along side the query part as top field in the json request.

```
curl -s -XPOST 'localhost:9200/_search' -d '{
    "query" : {
        ...
    },
    "suggest" : {
        ...
    }
}'
```

Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. The `my-suggest-1` suggestion uses the `body` field and `my-suggest-2` uses the `title` field. The `type` field is a required field and defines what suggester to use for a suggestion.

```
"suggest" : {
    "suggestions" : {
        "my-suggest-1" : {
            "type" : "fuzzy",
            "field" : "body",
            "text" : "the amsterdma meetpu"
        },
        "my-suggest-2" : {
            "type" : "fuzzy",
            "field" : "title",
            "text" : "the rottredam meetpu"
        }
    }
}
```

The below suggest response example includes the suggestions part for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains a terms array, that contains all terms outputted by the analyzed suggest text. Each term object includes the term itself, the original start and end offset in the suggest text and if found an arbitary number of suggestions.

```
{
    ...
    "suggest": {
        "my-suggest-1": {
            "terms" : [
              {
                "term" : "amsterdma",
                "start_offset": 5,
                "end_offset": 14,
                "suggestions": [
                   ...
                ]
              }
              ...
            ]
        },
        "my-suggest-2" : {
          "terms" : [
            ...
          ]
        }
    }
```

Each suggestions array contains a suggestion object that includes the suggested term, its document frequency and score compared to the suggest text term. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance.

```
"suggestions": [
    {
        "term": "amsterdam",
        "frequency": 77,
        "score": 0.8888889
    },
    ...
]
```

# Global suggest text

To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is a global option and applies to the `my-suggest-1` and `my-suggest-2` suggestions.

```
"suggest" : {
    "suggestions" : {
        "text" : "the amsterdma meetpu",
        "my-suggest-1" : {
            "type" : "fuzzy",
            "field" : "title"
        },
        "my-suggest-2" : {
            "type" : "fuzzy",
            "field" : "body"
        }
    }
}
```

The suggest text can be specied as global option or as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.

# Other suggest example.

In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase.

```
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
      "suggestions" : {
        "my-title-suggestions" : {
          "suggester" : "fuzzy",
          "field" : "title",
          "text" : "devloping distibutd saerch engies",
          "size" : 3
        }
      }
  }
}'
```

The above request could yield the response as stated in the code example below. As you can see if we take the first suggested term of each suggest text term we get `developing distributed search engines` as result.

```
{
  ...
  "suggest": {
    "my-title-suggestions": {
      "terms": [
        {
          "term": "devloping",
          "start_offset": 0,
          "end_offset": 9,
          "suggestions": [
            {
              "term": "developing",
              "frequency": 77,
              "score": 0.8888889
            },
            {
              "term": "deloping",
              "frequency": 1,
              "score": 0.875
            },
            {
              "term": "deploying",
              "frequency": 2,
              "score": 0.7777778
            }
          ]
        },
        {
          "term": "distibutd",
          "start_offset": 10,
          "end_offset": 19,
          "suggestions": [
            {
              "term": "distributed",
              "frequency": 217,
              "score": 0.7777778
            },
            {
              "term": "disributed",
              "frequency": 1,
              "score": 0.7777778
            },
            {
              "term": "distribute",
              "frequency": 1,
              "score": 0.7777778
            }
          ]
        },
        {
          "term": "saerch",
          "start_offset": 20,
          "end_offset": 26,
          "suggestions": [
            {
              "term": "search",
              "frequency": 1038,
              "score": 0.8333333
            },
            {
              "term": "smerch",
              "frequency": 3,
              "score": 0.8333333
            },
            {
              "term": "serch",
              "frequency": 2,
              "score": 0.8
            }
          ]
        },
        {
          "term": "engies",
          "start_offset": 27,
          "end_offset": 33,
          "suggestions": [
            {
              "term": "engines",
              "frequency": 568,
              "score": 0.8333333
            },
            {
              "term": "engles",
              "frequency": 3,
              "score": 0.8333333
            },
            {
              "term": "eggies",
              "frequency": 1,
              "score": 0.8333333
            }
          ]
        }
      ]
    }
  }
  ...
}
```

# Common suggest options:
* `suggester` - The suggester implementation type. The only supported value is 'fuzzy'. This is a required option.
* `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.

# Common fuzzy suggest options
* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value:
** `score` - Sort by sore first, then document frequency and then the term itself.
** `frequency` - Sort by document frequency first, then simlarity score and then the term itself.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
** `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
** `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
** `always` - Suggest any matching suggestions based on terms in the suggest text.

# Other fuzzy suggest options:
* `lowercase_terms` - Lower cases the suggest text terms after text analyzation.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.

 Closes #2585
2013-01-24 15:41:06 +01:00
Simon Willnauer 4eefcb9c82 Expose CommonTermsQuery
Closes #2583
2013-01-24 14:18:01 +01:00
Shay Banon c2f35621f6 allow to get settings as delimited string 2013-01-24 12:03:16 +01:00
Shay Banon b143822bac allow to load settings from delimited string 2013-01-24 12:00:14 +01:00
Simon Willnauer 88f68264c7 Reuse MemoryIndex instances across Percolator requests.
* added configurable MemoryIndexPool that pools MemoryIndex instance across Threads
* Pool can be configured based on the number of pooled instances as well as the maximum number of bytes that is reused across the pooled instances

Closes #2581
2013-01-24 11:53:21 +01:00
Shay Banon e8c1180ede add field data stats 2013-01-24 11:38:18 +01:00
Shay Banon 613b746299 move field data type to simply be type and settings 2013-01-24 09:33:16 +01:00
Martijn van Groningen 346422b747 Added sparse multi ordinals implementation for field data. 2013-01-23 22:11:31 +01:00
Shay Banon a74e7f8099 refactor geo to extract common classes 2013-01-23 14:14:21 +01:00
Shay Banon d969e61999 Remove settings option for index store compression, compression is always enabled
closes #2577
2013-01-23 13:11:48 +01:00
Simon Willnauer 2880cd0172 Upgrade to Lucene 4.1
* Removed CustmoMemoryIndex in favor of MemoryIndex which as of 4.1 supports adding the same field twice
* Replaced duplicated logic in X[*]FSDirectory for rate limiting with a RateLimitedFSDirectory wrapper
* Remove hacks to find out merge context in rate limiting in favor of IOContext
* replaced Scorer#freq() return type (from float to int)
* Upgraded FVHighlighter to new 'centered' highlighting
* Fixed RobinEngine to use seperate setCommitData
2013-01-23 11:54:11 +01:00
Igor Motov bbfd3957eb Improve stability of the testNodesInfos test 2013-01-22 12:29:38 -05:00
Igor Motov 9becdb814a Improve stability of the shardsCleanup test 2013-01-22 10:20:18 -05:00
Shay Banon c295211a85 final move to new field data 2013-01-22 16:16:33 +01:00
uboness 09cc70b8c9 added predefined empty implementation for all atomic field datas 2013-01-22 16:16:33 +01:00
Shay Banon 772ee9db54 move terms to use new field data 2013-01-22 16:16:32 +01:00
Shay Banon e5b651321f remove some safe methods because of the new makeSafe method usage 2013-01-22 16:16:32 +01:00
Shay Banon 5b7173fc35 move sorting to work with new field data 2013-01-22 16:16:32 +01:00
Shay Banon 45f27fe96a add packed bytes variant for strings/bytes 2013-01-22 16:16:32 +01:00
uboness 855b64a8a7 byte field data implementation 2013-01-22 16:16:31 +01:00
uboness f1f3c241fd short field data implementation 2013-01-22 16:16:31 +01:00
uboness 3840439365 float field data implementation 2013-01-22 16:16:31 +01:00
uboness fc09ce7ac9 Implemented int field data 2013-01-22 16:16:31 +01:00
Shay Banon e0b280f9b3 use FieldMapper.Names for fieldNames, and not just fieldName as string 2013-01-22 16:16:30 +01:00
Shay Banon 7dc5cf9799 add long field support 2013-01-22 16:16:30 +01:00
Shay Banon 7397007e05 initial commit 2013-01-22 16:16:30 +01:00
Simon Willnauer 35cf9ee11d wait for cluster to be formed in SimpleNodesInfoTests 2013-01-19 15:44:26 +01:00
Simon Willnauer d6b613ac8c Respect lowercase_expanded_terms in MappingQueryParser
Fixes #2566
2013-01-19 13:57:45 +01:00
Simon Willnauer c563248f76 testMoreLikeThisIssue2197 should create index mapping first to prevent races 2013-01-18 16:41:37 +01:00
Simon Willnauer 6f38a3a8a8 create index and mapping first to ensure all relevant nodes see the mapping 2013-01-18 16:09:24 +01:00
Simon Willnauer 393de984bd Remove deprecated StreamInput/Output#read/writeUTF 2013-01-17 22:38:42 +01:00
Simon Willnauer 3d80c53192 Allow ShardsAllocator to be configured via node level settings.
* Default ShardsAllocator is set to BalancedShardsAllocator
* Core ShardsAllocator implementations can be defined via 'cluster.routing.allocation.type'
* Core ShardsAllocator implementations are exposed via short keys 'balanced' (BalancedShardsAllocator) and 'even_shards' (EvenShardsCountAllocator)
* Third party allocators can be loaded via fully-qualified class names.

Closes #2557
2013-01-17 16:23:52 +01:00
Simon Willnauer 2eb09e6b1a Added BalancedShardsAllocator that balances shards based on a weight function.
* Weights are calculated per index and incorporate index level, global and primary related parameters
 * Balance operations are executed based on a win maximation strategy that tries to relocate shards
   first that offer the biggest gain towards the weight functions optimum
 * The WeightFunction allows settings to prefer index based balance over global balance and vice versa
 * Balance operations can be throttled by raising a threshold resulting in less agressive balance operations
 * WeightFunction shipps with defaults to achive evenly distributed indexes while maintaining a global balance

Closes #2555
2013-01-17 12:02:42 +01:00
Igor Motov d97839b8a8 Fix char filter issues introduced during lucene 4 migration
Fixes #2543
2013-01-14 12:43:02 -05:00
Igor Motov 6243f8e64d Disallow unknown custom indexing parameters
Fixes #2354
2013-01-11 10:14:25 -05:00
Martijn van Groningen 1ce10dfb06 Fixed issue where parent & child queries can fail if a segment doesn't have documents with the targeted type or associated parent type
Closes #2537
2013-01-11 16:06:14 +01:00
Shay Banon 2c4b9d9ba2 cleanup queryHint since its not was never used
preference ended up as the way to control routing
2013-01-07 04:02:45 +01:00
Shay Banon 0e5287f1f2 Binary Mapped Fields: Allow to not store them by default, and return BytesReference
fixes #2523
also, fix another point of normalization of the result for get API
2013-01-05 01:50:46 +01:00
Shay Banon 4b9fcdb900 noramalize the value even when getting it from source
we need to in order to properly handle bytes, and normalize Integer to Long for example for consistency, the fact that mappers now handle different Objtes help here
2013-01-04 23:55:34 +01:00
Shay Banon bf4c442509 add refresh before calling count 2013-01-04 08:38:00 +01:00
Shay Banon 70f1e2c987 remove problematic timeout test
the timeout feature, even if set to 0, might still mean we get an ack back...
2013-01-04 08:19:01 +01:00
Igor Motov d73a6663b7 Changing non-nested object mapping to nested should fail
Fixes #2518
2013-01-03 18:18:40 -05:00
uboness dc25939b7c fixed hunspell test to clean up properly, this time, for realz 2013-01-03 22:54:37 +01:00
uboness 86f55b3a45 fixed hunspell test to clean up properly 2013-01-03 22:12:36 +01:00
Martijn van Groningen 7cf80aca99 Changed how the stored values of numeric fields are stored in the index. Before numeric values were stored in binary representation, now the values in numeric representation. 2013-01-03 21:34:53 +01:00
uboness 6c4108b38a Support for hunspell token filter
Closes: #646

- Introduced HunspellService which holds a repository of hunspell dictionaries
- It is possible to register a dictionary via a plugin or by placing the dictionary files on the file system
2013-01-02 03:51:26 +01:00
Shay Banon b6f766af3f backport lucene 4.1 terms filter and use it where applicable 2012-12-29 10:39:53 -08:00
Shay Banon b08e8fb76c add explicit termsFilter in mapper, and use that in terms filter
This also enabled support for terms filter on _id field for example
2012-12-29 01:06:32 -08:00
Shay Banon 01ba287164 more mapper simplification, reduce the value methods 2012-12-28 20:33:10 -08:00
Shay Banon e02015c641 Use FieldType and not deprecated Field construction 2012-12-28 14:27:09 -08:00
Shay Banon 64a01c28c3 rename fieldQuery/fieldFilter to termQuery/termFilter in mappers 2012-12-28 13:48:48 -08:00
Shay Banon 7fb98769a6 add a sleep to make sure settings are applied 2012-12-27 14:16:30 -08:00
Igor Motov b7ff23ff93 Update settings: Allow to dynamically update thread pool settings
Closes #2509
2012-12-27 09:39:27 -05:00
Simon Willnauer 750c30f0b8 allow index and type to be specified as arrays in MultiSearchRequest 2012-12-26 16:18:17 -08:00
Shay Banon ef55e4feec fix failed tests due to wrapping failures with mapping parsing exception 2012-12-26 15:59:07 -08:00
Shay Banon 8a17222ff2 match_all filter with empty array (instead of obj) fires exception when used with facets
fixes #2493
2012-12-26 15:35:30 -08:00
Shay Banon 2f4b759df7 Allow highlighting on wildcard fields.. ie, comment_*
closes #2396
2012-12-26 15:00:31 -08:00
Shay Banon 4b69846ba2 cleanup 2012-12-26 14:21:04 -08:00
Simon Willnauer 90bd82ac50 Pass topScorer=false to sub-scorers if a scorer is wrapped. Wrapped BooleanQuery can return collect-only scorers. See #2505 2012-12-26 14:20:14 -08:00
Martijn van Groningen c93babed42 Minor changes to the parent / child benchmark. 2012-12-24 22:12:10 +01:00
Martijn van Groningen c6aaefa27f Improved explain support for nested query.
Closes #2503
2012-12-24 13:40:20 +01:00
Martijn van Groningen d57d89937f Added scoring support to `has_child` and `has_parent` queries.
Added score support to `has_child` and `has_parent` queries. Both queries support a score_type option. The has_child support the same options as the top_children query and the none option which is the default and yields the current behaviour. The has_parent query support the score type options: score and none. The latter is the default and yields the current behaviour.

If the score_type is set to a value other than none then the has_parent query map the matched parent score into the related children documents. The has_child query then map the matched children documents into the related parent document. The score_type on both queries defines how the children documents scores are mapped in the parent documents. Both queries are executed in two phases. First phase collects the parent uid values of matching documents with an aggregated score per parent uid value. In the second phase either child or parent typed documents are emitted as hit that have the same parent uid value as found during the first phase. The score computed in the first phase will be used as score.

Closes #2502
2012-12-24 11:39:43 +01:00
Igor Motov fcdc36977c Fix failure message serialization in MultiSearchResponse
Fixes #2498
2012-12-21 19:26:48 -05:00
Martijn van Groningen 08b026d060 Fixed `top_children` query failure with dfs_query* search types.
Fixed error with the top_children query when `DFS_QUERY_*` is used as search_type and wraps a query that gets rewritten (E.g wildcard query).

Closes #2501
2012-12-21 18:08:44 +01:00
Martijn van Groningen 694989141b Fixed AOBE when using `top_children` in a must not clause.
Closes #2500
2012-12-21 16:47:03 +01:00
Martijn van Groningen 826a6ab02a Improved XBooleanFilter by adding drive logic for bit based filter impl and adding unit test, which tests all possible XBooleanFilter options. 2012-12-19 22:43:47 +01:00
Shay Banon 14678a91ab nested path to be represented as bytes as well as string 2012-12-18 15:39:13 -08:00
Shay Banon ac253178bd more cleanup in mappings 2012-12-18 13:28:36 -08:00
Shay Banon 1867ef5084 simplify toXContent generation of field mappers 2012-12-18 13:00:47 -08:00
Martijn van Groningen ddea22771e Fixed mlt api bug related to custom routing value.
If the a routing value isn't id based, the get part of the mlt request couldn't retrieve the document for the second part of the mlt request and a 500 code is returned instead. This fix addresses this issue.

Closes #2489
2012-12-17 11:00:30 +01:00
uboness 8b74c42099 Support for RegexpQuery & RegexpFilter
- Added "regexp" query type (based on Lucene 4 RegexpQuery)
- Added "regexp" filter type
- Fixed a bug in IdFieldMapper where prefixQuery on a single type would be redundantly wrapped in a boolean query
2012-12-16 23:24:18 +01:00
Igor Motov c8285739d2 Correctly parse *:* query into matchAllDocsQuery
Fixes #2486
2012-12-14 14:36:20 -08:00
Shay Banon 36fd76b826 don't call toLowerCase on each bulk item 2012-12-12 21:56:23 -08:00
Martijn van Groningen ea9a4d70cf lucene 4: Removed the usage of Document & Field when retrieving stored fields. 2012-12-06 18:18:52 +01:00
Igor Motov d947dfde2b Add support for ignoring settings in system properties.
An elasticsearch node can be instructed to ignore settings specified in system properties by setting config.ignore_system_properties setting to true.
2012-12-06 09:37:36 -05:00
Martijn van Groningen f72d5c1907 Expose fragmenter option for plain / normal highlighter.
Closes #2465
2012-12-06 14:59:42 +01:00
Shay Banon c22b521800 fix properly handling acceptDocs in filters
our idea is to apply it on the "filtered/constant" level, and not on compound filters, so we won't apply it multiple times. The solution is conservative a bit now, we can further optimize it in the future, for example, not to wrap it when no caching is done within the filter chain
2012-12-06 01:55:16 +01:00
Martijn van Groningen 6cfd938dce Fixed unable to highlight on all multi-valued field values.
Closes #2384
2012-12-03 12:39:18 +01:00
Shay Banon f17ad829ac remove snappy support
relates to #2459
2012-12-03 12:30:13 +01:00
Shay Banon a2a8553faf Indexing Slow Log
closes #2457
2012-12-03 10:21:59 +01:00
Igor Motov 6021515567 The relevancy score in explanation should match the actual score in custom_filters_query
Fixes #2441
2012-11-27 10:13:16 -08:00
Shay Banon 69ef822da6 cleanup docsets
- remove the DocSet abstraction, and use Bits where we can by getting it from DocIdSet
- better handling of acceptDocs, though still need to properly apply them when caching is involved
2012-11-27 10:04:21 -08:00
Igor Motov fb9143aac1 fix sporadically disappearing fields during concurrent dynamic mapping updates 2012-11-24 14:02:58 +01:00
Simon Willnauer 32a0772821 #2436 expose KeepWordTokenFilter by default 2012-11-23 10:11:30 +01:00
Igor Motov 65a43d3ad4 Fix handling of stop word _lang_ notation
Fixes #2412
2012-11-23 09:54:02 +01:00
Shay Banon e1679b89bb fix failed test that were using the wrong form match query 2012-11-22 15:14:02 +01:00
Shay Banon 192cf5298a fix failed test that were using the wrong form match query 2012-11-22 14:44:03 +01:00
Chris Male 2541847945 Added control over Query used by MatchQuery with there are zero terms after analysis 2012-11-22 22:13:29 +13:00
Chris Male 9e2469e04f Add per-field Similarity support 2012-11-21 12:44:59 +13:00
Martijn van Groningen be70722de7 Renamed pulsing40 and Lucene40 postings format providers to pulsing and default respectively for more consistent naming in settings. 2012-11-15 09:54:00 +01:00
Martijn van Groningen 20c6085852 changed test method names. 2012-11-15 09:40:24 +01:00
Martijn van Groningen e80f74584b Added licence header. 2012-11-15 00:18:49 +01:00
Martijn van Groningen fd5bd102aa lucene 4: Exposed Lucene's codec api
This feature adds the option to configure a `PostingsFormat` and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs.

## Configuring a postingsformat per field

There're several default postings formats configured by default which can be used in your mapping:
a* `direct` - A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance.
* `memory` - A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist.
* `bloom_default` - Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of the `default` postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist.
* `bloom_pulsing` - Similar to the `bloom_default` postings format, but builds on top of the `pulsing` postings format.
* `default` - The default postings format. The default if none is specified.

On all fields it possible to configure a `postings_format` attribute. Example mapping:
```
{
  "person" : {
     "properties" : {
         "second_person_id" : {"type" : "string", "postings_format" : "pulsing"}
     }
  }
}
```

## Configuring a custom postingsformat
It is possible the instantiate custom postingsformats. This can be specified via the index settings.
```
{
   "codec" : {
      "postings_format" : {
         "my_format" : {
            "type" : "pulsing40"
            "freq_cut_off" : "5"
         }
      }
   }
}
```
In the above example the `freq_cut_off` is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary.

Closes #2411
2012-11-14 23:54:29 +01:00
Igor Motov 120560bd0a Using non-mapped fields in prefix queries shouldn't cause NullPointerException
Fixes #2408
2012-11-14 18:34:54 +01:00
Igor Motov f47d62cc30 Date fields shouldn't be returned as longs by Get API 2012-11-13 21:36:28 +01:00
Igor Motov d1281d283b Add `index.routing.allocation.require....` and `cluster.routing.allocation.require....` settings
Fixes #2404
2012-11-13 19:29:20 +01:00
Martijn van Groningen 978c95649e lucene 4: Fixed SimpleQueryTests 2012-11-12 13:44:42 +01:00
Martijn van Groningen 05746adeb2 lucene 4: Set number of replicas to 0. Makes the test run faster. 2012-11-12 13:44:42 +01:00
Igor Motov c2f3eab7d3 lucene 4: fix sorting 2012-11-12 13:44:42 +01:00
uboness cae66fb636 * lucene 4: added missing short support in stream input/output
* lucene 4: added more extensive test for stored fields
2012-11-12 13:44:41 +01:00
Igor Motov f8842d5a4f lucene 4: fix TokenFilterTests 2012-11-12 13:44:41 +01:00
Shay Banon 9d5cae23fa lucene 4: fix general mapping test
no need to test for boost, we already have specific boost tests, in general, we should get rid of this test, and use more specialized tests if we are missing some
2012-11-12 13:44:41 +01:00
Shay Banon 5c45aad260 lucene 4: fix boost mapping tests 2012-11-12 13:44:41 +01:00
Igor Motov 3f3a95668b lucene4: add support for omit_norm setting to numeric types and don't omit norms if boost is not 1.0
This commit enables setting boost for numeric fields. However, there is still no way to take advantage of boosted numeric fields during searching because all queries against numeric fields are translated into range queries wrapped in ConstantScore. Boost for numeric fields is broken on master as well https://gist.github.com/7ecedea4f6a5219efb89
2012-11-12 13:44:40 +01:00
Igor Motov 2fb3591792 lucene4: fixed default values tests to refer to correct default FieldType constants 2012-11-12 13:44:40 +01:00
Igor Motov a5bef30be9 lucene4: fixed CompressIndexInputOutputTests 2012-11-12 13:44:40 +01:00
Igor Motov 3816366780 lucene4: fixed SimpleAllMapperTests 2012-11-12 13:44:40 +01:00
Shay Banon a38064913f lucene 4: fix engine tests 2012-11-12 13:44:40 +01:00
Igor Motov bf13f3f81e lucene4: fixed SimpleIndexQueryParserTests 2012-11-12 13:44:39 +01:00
Martijn van Groningen db639e5c2e lucene 4: Upgraded SimpleLuceneTests class. Test actually passes now. 2012-11-12 13:44:39 +01:00
Martijn van Groningen 2a8161d096 lucene 4: Upgraded SimpleLuceneTests class.
The complete codebase compiles now!
2012-11-12 13:44:39 +01:00
Martijn van Groningen aa2a8c66cc lucene 4: Upgraded UidFieldTests class. 2012-11-12 13:44:39 +01:00
Martijn van Groningen 5c0ef796e8 lucene 4: Upgraded BoostMappingTests + SimpleMapperTests 2012-11-12 13:44:39 +01:00
Shay Banon cefe2ba870 lucene 4: fix fuzzy query test 2012-11-12 13:44:39 +01:00
Igor Motov 787b7a3900 lucene4: more unit test cleanup 2012-11-12 13:44:37 +01:00
Igor Motov 5ad40205c2 lucene4: remove DocumentBuilder and FieldBuilder 2012-11-12 13:44:37 +01:00
Igor Motov bb76542068 lucene4: unit tests cleanup 2012-11-12 13:44:37 +01:00
Igor Motov 6b4e483f55 lucene4: fixed unit.index.mapper, unit.index.query and unit.index.store test (with exception of document boost and similarity issues) 2012-11-12 13:44:37 +01:00
Igor Motov 6bbe37f876 lucene4: fixed integration tests that got broken by switch from String to Text in Facet terms 2012-11-12 13:44:36 +01:00
Shay Banon f572a7bcf7 lucene 4: no close on searcher anymore 2012-11-12 13:44:31 +01:00
Martijn van Groningen 454954e7be lucene 4: Fix field data, facets and field comparators 2012-11-12 13:44:31 +01:00
Shay Banon a8e43578a2 Adding a type with _source or _all enabled fails, when these are disabled in index
fixes #2394
2012-11-09 17:21:25 +01:00
Igor Motov af1e8c0eb1 Add auto index creation on update request
Fixes #2375
2012-11-02 10:18:51 -04:00
Aaron Dixon bd9a5bfa0c fixed issue2371 (incorrect behavior of path_match) 2012-11-01 22:11:33 +01:00
Chris Male 768b8b4d2b Changed SpatialRelation contains to within 2012-11-01 22:03:47 +01:00
Igor Motov 23f7b0002a Deleting a non-existent warmer shouldn't cause request to hang
Fixes #2363
2012-11-01 21:49:54 +01:00
Igor Motov a2628b5eb2 Upsert should return fields Fixes #2362 2012-11-01 21:43:23 +01:00
Igor Motov 29928a9e15 Add test for percolation with the _size field enabled 2012-10-23 23:08:52 +02:00
Igor Motov c551f93cae Add highlighter type switch 2012-10-23 02:25:55 +02:00
Shay Banon 04eabbd38a Mapping: string mapping to automatically set omit_norms to true and index_options to docs when setting index to not_analyzed
closes #2349
2012-10-22 19:01:01 +02:00
Martijn van Groningen ee5df74a6b Fixed delete by query issue with index aliases and nested mappings.
The issue was that under these circumstances the delete by query operation would run forever.
What also is fixed is that during shard recovery when delete by query is replayed nested docs
are also deleted. Closes #2302
2012-10-20 23:39:15 +02:00
Shay Banon 246dc1d992 formatting 2012-10-19 09:48:48 +02:00
Simon Willnauer b6a83fd8b2 #2332 support CustomScoreQuery highlighting and expose QueryBuilder on CustomScoreQuery via Java API 2012-10-19 09:23:16 +02:00
Martijn van Groningen 51e69e1a9e Fixed NPE when using `has_parent` or `has_child` filter/query.
The NPE occurred when for an arbitrary segment no parent documents exist for a has_parent filter/query and no child documents exist for a has_child filter/query.

Closes #2297
2012-09-28 17:08:30 +02:00
Shay Banon 613c70c289 introduce TransportResponse
a class that needs to be used when sending a response over the transport layer, with an option to have headers
2012-09-27 18:05:16 +02:00
Shay Banon cfe7504d1c introduce TransportRequest (with optional headers)
introduce a new class, TransportRequest, which includes headers. This class can be used when sending requests over the transport layer, and ActionRequest also extends it now.
This is the first phase of the refactoring part in the transport layer and action layer to allow for simpler implementations of those as well as simpler "filtering" capabilities in the future
2012-09-26 23:46:28 +02:00
Martijn van Groningen 81a6940ad3 Fixed score explain is for `custom_filters_score` query.
Only the explain of the filter was included. This fix adds an explain for the inner query and wraps it in a top-level explanation.
2012-09-24 14:00:34 +02:00
Shay Banon 6e66f45f58 minor geo shape fetch improvements 2012-09-23 21:31:13 +02:00
Chris Male 05e0b4d4e0 Added ShapeFetchService with support in GeoShapeQueryParser/FilterParser 2012-09-23 21:24:13 +02:00
Chris Male 4f5e62e988 Added MultiPolygon parsing and serialization support 2012-09-21 14:03:21 +02:00
Martijn van Groningen 8080fdc509 Added types exists api
The types exists api checks whether one or more types exists in one or more indices.

## Example usage
curl -XHEAD 'localhost:9200/twitter/tweet'

## Options
* `index` - One or more indices. Either specified as query string parameter or in the uri path.
* `type` - One or more types. Either specified as query string parameter or in the uri path.
* `ignore_missing` -  Determines what type of indices to exclude from a request. The option can have the following values: `none` or `missing`.

Closes #2273
2012-09-21 10:21:32 +02:00
Chris Male 6fc0b83e07 Upgraded to Spatial4j 0.3 2012-09-20 23:55:51 +02:00
Shay Banon 0fadbf2177 more easily add a field with boost to multi match builder 2012-09-20 12:38:13 +02:00
Shay Banon 83a39bd509 improve test by not waiting for green state, no need 2012-09-20 12:00:58 +02:00
Shay Banon f8e1291243 global node indices level queries to be created by guice 2012-09-20 11:54:46 +02:00
Martijn van Groningen d5aa35e0ea Added better error handling for has_child, has_parent and top_children.
If has_parent, has_child or top_children are executed incorrectly then a better exception is thrown. This gives a better error description when one of these queries or filters is being used in count api.

Closes #2261
2012-09-18 13:26:23 +02:00
Shay Banon 7924115b90 Disable allocation: New indices allocation not to be disabled by default
When setting cluster.routing.allocation.disable_allocation, it causes new indices primary shards to not be allocated. By default, new indices created should allow to, at the very least, allocate primary shards so they become operations. A new setting, cluster.routing.allocation.disable_new_allocation, allows to also disable "new" allocations.
closes #2258.
2012-09-17 16:00:55 +02:00
Shay Banon 90e0a70e0e cancel allocation command to allow_primary to be cancelled 2012-09-17 12:31:33 +02:00
Shay Banon afca5ef15f The reroute command allows to explcitiyly execute a cluster reroute allocation command including specific commands. For example, a shard can be moved from one node to another explicitly, an allocation can be canceled, or an unassigned shard can be explicitly allocated on a specific node.
Here is a short example of how a simple reroute API call:

    curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [
            {"move" : {"index" : "test", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}},
            {"allocate" : {"index" : "test", "shard" : 1, "node" : "node3"}}
        ]
    }'

An importnat aspect to remember is the fact that once when an allocation occurs, the cluster will aim at rebalancing its state back to an even state. For example, if the allocation includes moving a shard from `node1` to `node2`, in an "even" state, then another shard will be moved from `node2` to `node1` to even things out.

The cluster can be set to disable allocations, which means that only the explicitl allocations will be performed. Obviously, only once all commands has been applied, the cluster will aim to be rebalance its state.

Anohter option is to run the commands in "dry_run" (as a URI flag, or in the request body). This will cause the commands to apply to the current cluster state, and reutrn the resulting cluster after the comamnds (and rebalancing) has been applied.

The commands supporterd are:

* `move`: Move a started shard from one node to anotehr node. Accepts `index` and `shard` for index name and shard number, `from_node` for the node to move the shard "from", and `to_node` for the node to move the shard to.
* `cancel`: Cancel allocation of a shard (or recovery). Accepts `index` and `shard` for index name and shar number, and `node` for the node to cancel the shard allocation on.
* `allocate`: Allocate an unassigned shard to a node. Accepts the `index` and `shard` for index name and shard number, and `node` to allocate the shard to. It also accepts `allow_primary` flag to explciitly specify that it is allowed to explciitly allocate a primary shard (might result in data loss).

closes #2256
2012-09-15 22:56:14 +02:00
Martijn van Groningen cfe76546f2 Added has_parent query The `has_parent` query works the same as the `has_parent` filter, by automatically wrapping the filter with a constant_score. It has the same syntax as the `has_parent` filter. Closes #2254 2012-09-14 17:34:01 +02:00
Martijn van Groningen 3cd54fc4ee Improve `has_child` filter / query performance (#2251) Added a new has_child filter implementation, that works _uid based instead of bitset based. This implementation is about ~2 till ~6 times (depending on the query) faster than the already existing bitset based implementation. 2012-09-14 14:37:29 +02:00
Shay Banon ef9974ce2c add serialization options for allocation commands 2012-09-14 14:26:39 +02:00
Martijn van Groningen 2bd9b3aed0 Added `has_parent` filter (#2243)
The `has_parent` filter accepts a query and a parent type. The query is executed in the parent document space, which is specified by the parent type. This filter return child documents which associated parents have matched. For the rest `has_parent` filter has the same options and works in the same manner as the `has_child` filter.

This is an experimental filter.

Filter example
###################
```
{
    "has_parent" : {
        "parent_type" : "blog"
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}
```
The `parent_type` field name can also be abbreviated to `type`.

Memory considerations
###############
With the current implementation, all _id values are loaded to memory (heap) in order to support fast lookups, so make sure there is enough mem for it.

This issue originates from issue #792
2012-09-13 13:35:45 +02:00
Shay Banon e530f03b94 internal custom allocation commands
add support for internal custom allocation commands, including allocation, move, and cancel (shard).
also, fix #2242, which causes the cluster state to be in inconsistent state when a shard being the source of relocation is failed
2012-09-12 15:13:27 +02:00
Martijn van Groningen b6a9bd9a31 - Fixed boosting per field with multi_match query. 2012-09-12 11:43:33 +02:00
Shay Banon 24ce2ef537 handle EOF when handling arrays as well 2012-09-05 11:40:39 +02:00
Shay Banon a42159f8d5 Shard Allocation: `index.routing.allocation....` settings do not "remove" the setting on empty string, closes #2229. 2012-09-03 16:44:23 +02:00
Martijn van Groningen 9b29950997 Added fields option to explain api. #2203 2012-08-31 22:19:09 +02:00
Martijn van Groningen cd0e1226e1 Added a global ignore_malformed index setting. #2220 Also extended the ignore_malformed support to TTL, Ip and timestamp field types. 2012-08-31 22:10:46 +02:00
Martijn van Groningen dea2de3304 Add ignore_indices option for search, multisearch, count and other Broadcast request. #2209 2012-08-27 15:36:14 +03:00
Martijn van Groningen 1d4aee6086 - Explain api opens 2 engine searchers, but closes only 1 engine searcher. Closes #2206 2012-08-27 12:20:02 +03:00
Martijn van Groningen bbe735f2cc Fixed issue #2197 2012-08-25 00:38:26 +03:00
uboness b4b33bb205 Local node master listener
* Fixed an issue where dynamic update to minimum_master_nodes settings would not take immediate effect
* Added LocalNodeMasterListener support to the ClusterService. Enables listening to when the local node becomes/stopped being a master
2012-08-24 02:25:13 +02:00
uboness 3fdb9f0a27 Enabled the option of configuring plugin types in the settings. This will also help in tests when testing plugin related functionality 2012-08-21 23:00:24 +02:00
Martijn van Groningen 8365e7ba0b - Added explain api. #2184 2012-08-21 13:26:17 +02:00
Shay Banon 9aae62b4a6 All Field: Automatically detect when field level boosting is used, and optimize when its not, closes #2189. 2012-08-20 15:07:32 +02:00
Shay Banon e3a9271000 unify more count and search implementation 2012-08-19 16:54:53 +02:00
Simon Willnauer b0b5775c98 use term query instead of a specialized SpanTermQuery on _all field if positions are omitted 2012-08-16 10:42:14 -07:00
Shay Banon ab49a8c2fc improve update test to wait for green cluster state 2012-08-14 01:47:18 +02:00
Simon Willnauer 53f65d8ff2 Remove / deprecated omit_term_freq_and_positions in favor of IndexOptions exposed via mapping API 2012-08-13 17:19:08 +02:00
Shay Banon eda3da2aea fix geo shape tests 2012-08-13 14:40:36 +02:00
Chris Male bea4346f3a Added GeoShape indexing and querying support 2012-08-13 13:44:29 +02:00
Martijn van Groningen b979dfa0be Add lenient option to match & multi_match queries. #2156 2012-08-09 21:56:50 +02:00
Shay Banon fedd1965ea Update API: Update through an alias with routing configured on it fail to use the routing, closes #2155. 2012-08-09 15:14:52 +02:00
Martijn van Groningen e43dd4687e - Added support for multi match query. 2012-08-09 11:36:59 +02:00
Martijn van Groningen 195e586fd8 - Fixed timezone parsing when input starts with '+'sign. Fixes issue #2141 2012-08-07 22:53:00 +02:00
Martijn van Groningen 37e7a54b0e Fixed top children query bug reported in issue #2140
Fixed type.
2012-08-06 22:02:03 +02:00
Martijn van Groningen 0e3c825501 Added ignore_malformed mapping parameter for all number like types. Issue #2120 2012-08-03 10:41:07 +03:00
Shay Banon 7a0d7f531d fix test 2012-08-02 09:40:54 +03:00
Shay Banon e88dbafe51 rename Test to Tests, so it will be executed as part of the mvn tests as well, reformat a bit 2012-08-01 16:20:37 +03:00
Simon Willnauer d13a7809d1 #2116 Expose all ShingleFilter settings via ShingleTokenFilterFactory 2012-08-01 16:18:58 +03:00
Shay Banon 0492d9b8cb fix test failure message... 2012-07-31 21:02:34 +02:00
Shay Banon 82cfe0e8b2 upgrade to latest testng, improve console output when running test, add more options as env vars when using maven 2012-07-31 20:24:39 +02:00
Shay Banon bbc45fefe5 rename limit to ignore_above, and create a dedicated test 2012-07-31 13:00:10 +02:00
Martijn van Groningen 41b3a454cf Issue #2121 Added limit parameter for string type. 2012-07-31 13:00:03 +02:00
Shay Banon 4eb85bbbd6 Transport/Http: Remove explicit setting of send/receive buffer, and improve netty receive buffer predictor, closes #2124. 2012-07-30 21:37:38 +02:00
Shay Banon 7edafcf9a0 Node Stats: Add jvm buffer pools stats (when available, for java 7 and above), closes #2122. 2012-07-29 00:49:18 +02:00
Shay Banon 57e966e9d7 upgrade to jackson 2.0.4 2012-07-10 23:44:02 +02:00
Shay Banon 99d2f27c84 Introduce Text abstraction, allowing for improved representation of strings, apply to HighlightedField (breaks backward for Java API from String to Text), closes #2093.
By introducing the Text abstraction, we can keep (long) text fields in their UTF8 bytes format, and no need to convert them to a string when serializing it back to Json for example.

The first place we can apply this is to highlighted text, which can be long.. . This does breaks backward comp. for people using the Java API where the HighlightField now has a Text as its content, and not String.
2012-07-10 00:47:37 +02:00
Shay Banon 35233564fd buffer management refactoring
First phase at improving buffer management and reducing even further buffer copies. Introduce a BytesReference abstraction, allowing to more easily slice and "read/write references" from streams. This is the foundation for later using it to create smarter buffers on top of composite netty channels for example (which http now produces) as well as reducing buffer copies when sending transport/rest responses.
2012-07-07 01:26:41 +02:00
Shay Banon 8d1e04a973 have the quick rolling restart stress test also wait for 0 relocating shards 2012-07-06 01:01:18 +02:00
Shay Banon 57023c8ba9 Compression: Support snappy as a compression option, closes #2081. 2012-07-04 17:14:12 +02:00
Shay Banon e5c89def42 Support wildcard and +/- notation for multi index APIs, closes #2074. 2012-07-01 18:16:04 +02:00
Shay Banon 565db26e13 Store Compression: integer overflow causes failed reads (index is safe), closes #2071. 2012-06-30 01:37:46 +02:00
Shay Banon 8bab859822 simplify tests, doc file length 2012-06-29 16:01:17 +02:00
Shay Banon f2e39e4ee2 Auto import dangling indices, closes #2067. 2012-06-29 01:01:26 +02:00
Shay Banon a872c88f03 dangling index handling might still remove the state files for the dangling index, closes #2065. 2012-06-28 13:32:44 +02:00
Matt Weber d6bc17fee5 Partial update without script
Allow the use of "doc" as the update source when a script is not
specified.  New fields are added, existing fields are overwritten, and
maps are merged recursively.
2012-06-27 21:29:22 +02:00
Igor Motov a4ad84b5e4 Enable validation of queries with has_child and script filters 2012-06-27 21:23:02 +02:00
Igor Motov dbeda1ab2b Add missing serialization for error and explanation in validate query request 2012-06-27 21:23:01 +02:00
Shay Banon 2b893fe1e5 Use bloom filter when flushing (applying deletes), closes #2058. 2012-06-26 16:45:29 +02:00
Shay Banon 12a644c89b Stored Compression: failure to fetch document in certain cases (read failure, index compression works), closes #2055. 2012-06-26 01:54:19 +02:00
Shay Banon 6e7764a083 reduce objects created with bloom filter operations 2012-06-24 20:58:44 +02:00
Shay Banon 2fb867b467 Store Compression: Term Vector Vector, closes #2049. 2012-06-23 23:11:00 +02:00
Shay Banon 6fb836c25e better thread naming 2012-06-23 18:35:42 +02:00
Shay Banon 1780a2a067 Failure to recover properly on node(s) restart
When a node restarts, it might be canceling one recovery of a shard id only to get another one in the next cycle. We should detect this case and handle it properly.

This is a fix to the annoying message seen by users: suspect illegal state: trying to move shard from primary mode to replica mode.
2012-06-22 17:46:57 +02:00
Shay Banon cc3fab45ff Improve cluster resiliency to disconnected sub clusters + fix a shard allocation bug with quick rolling restarts
Two main changes:

Improve cluster resiliency to disconnected sub clusters. If a node pings a master and that node is no longer registered with the master, improve the rejoin process of that node to the cluster. Also, if a master receives a message from another master, pick one to force to rejoin the cluster (based on cluster state versioning).
On quick rolling restart, without waiting for shard allocation, the shard allocation logic can mess up its counts, causing for strange logic in allocating shards, or validation failures on routing table allocation.
2012-06-22 03:36:54 +02:00
Shay Banon b009c9c652 Stored Fields Compression, closes #2037.
Compressing the stored fields file (the .fdt file) directly allows to have better compression on the size of the index, specifically when indexing (and storing) small documents. The compression will be considerably more effective compared to compressing each doc on its own (when setting compress on the _source mapper). The downside is that more data needs to be uncompressed when loading documents.

The settings to control it is `index.store.compress.stored_fields` set to `true` (it defaults to `false`), and can be enabled dynamically using the update settings API. This allows to enabled compression at a later stage (i.e. old time based indices), and then optimize the index to make sure it gets compressed.
2012-06-20 05:31:34 +02:00
Shay Banon fbf4c70af9 add simple compression bench 2012-06-19 13:15:44 +02:00
Martijn van Groningen d66f401ce6 Better fix for mv field highlighting issue #1994 2012-06-19 04:13:47 +02:00
Shay Banon aebd27afbd abstract compression
abstract the LZF compression into a compress package allowing for different implementation in the future
2012-06-19 04:07:11 +02:00
Shay Banon 1a98a9184e fix test to shutdown threadpool 2012-06-19 03:37:08 +02:00
Shay Banon 7b3b130a62 fix tests to shutdown threadpool 2012-06-19 03:33:44 +02:00
Chris Male 040fa2581a Added GeoDistance test which verifies the difference in behaviour between ARC and PLANE, causing elliptical results 2012-06-15 22:55:45 +02:00
Shay Banon 982c8b4664 fix test to work with new noramalization 2012-06-14 15:55:33 +02:00
Chris Male 2315e6d239 Incorporated changes to normalization of latitude and longitudes so latitude normalization is correct and longitude is normalized at the same time 2012-06-14 15:43:36 +02:00
Shay Banon 133bd72f8d Multi Search API: Allow to set search_type on REST endpoint URI to apply to all search requests, closes #2023. 2012-06-13 20:47:24 +02:00
Shay Banon dfe6e58e37 use an array to represent the keys in the uid filter 2012-06-13 16:03:45 +02:00
Shay Banon 6eb419649a better/faster parsing of update request (with upsert) 2012-06-13 13:12:37 +02:00
Shay Banon 0b4fe4add3 rename doc to upsert in update API
a better descriptive name for it, and won't clash with future features on the update api
2012-06-13 12:42:10 +02:00
Martijn van Groningen 1319ed9322 Fixes highlight issue for multivalues fields described in issue #1994 2012-06-11 23:44:45 +02:00
Shay Banon 9905eab73a Update API: Allow to upsert, provide a doc and index it if the doc does not exists, closes #2008. 2012-06-08 02:01:04 +02:00
Shay Banon ccea825966 terms filter uses less memory when cached
move from a TreeSet to an array, sorting on creation
2012-06-07 23:34:21 +02:00
Shay Banon f87632fabd Query DSL: term/terms filter performance improvement (bulk reading), closes #1972. 2012-05-23 21:54:31 +02:00
Shay Banon 2c274e59d5 Percolator: Registering (indexing) a new percolator query will still be stored in memory if actually indexing it fails, closes #1965. 2012-05-19 19:36:01 +02:00
Shay Banon f0007fd4ae Create Index: Allow to provide index warmers when creating an index, closes #1917. 2012-05-07 14:27:30 +03:00
Shay Banon ca2dc1801c Index Template: Allow to register index warmers in an index template, closes #1916. 2012-05-07 14:00:37 +03:00
Shay Banon e0f3b7e885 Index Warmup API, closes #1913. 2012-05-06 18:50:35 +03:00
Shay Banon aeae380258 ClassCastException during percolation query, closes #1905. 2012-05-03 17:57:52 +03:00
Shay Banon 07f3ed05b0 Search Preference: Add _shards prefix to explicitly list shards, and add _prefer_node option, closes #1904 2012-05-03 01:12:22 +03:00
Shay Banon 8ca36c8dd5 allow internally to register index warmup actions, as well as expose stats on it 2012-04-29 00:37:20 +03:00
Shay Banon a4fb33dbc3 Date Histogram Facet: Add `quarter` as an interval, closes #1884. 2012-04-24 19:04:09 +03:00
Shay Banon 98b1f368f5 Better handling of fields that have `.` in their name when doing property based navigation, closes #1875. 2012-04-19 17:28:14 +03:00
Shay Banon 03c9eaf812 NullPointerException in geo_distance_range without to, closes #1865. 2012-04-17 15:51:45 +03:00
Shay Banon 16cd159a38 Upgrade to Lucene 3.6, closes #1862. 2012-04-15 17:39:41 +03:00
Shay Banon b78680c7ae Java API Query DSL: Add wrapper filter similar to wrapper query accepting a json filter in raw format, closes #1844. 2012-04-04 19:53:17 +03:00
Shay Banon cdfa87827a Update API: Allow to specify fields in the request to return updated fields, closes #1838. 2012-04-03 14:11:22 +03:00
Benjamin Devèze 0cf0703a7b add fields parameter for update API (#1822) 2012-04-03 13:35:12 +03:00
Shay Banon 9fb6ecf9f0 allow to more easily plug custom unicast host providers by being able to add them to ZenDiscoveryModule using a plugin 2012-03-31 21:38:39 +03:00
Igor Motov 8859594e36 add extended validation information 2012-03-24 13:40:25 +02:00
Shay Banon 348ed11450 Have streams provided to gateway (shared one) allow marking, closes #1803. 2012-03-22 12:20:00 +02:00
Shay Banon 752ae6e206 optimize acquiring search handler to use a search manager, also, creating a ContextIndexSearcher can be optimized if it is created from a searcher 2012-03-09 22:41:09 +02:00
Shay Banon c08b968246 rename the cached thread pool to generic (from cached), since really, cached is meaningless, and its actually a generic thread pool we use for different operations 2012-03-09 20:32:33 +02:00
Shay Banon e707e93942 Index Blocks: Add index.blocks.write, index.blocks.read, and index.blocks.metadata settings, closes #1771. 2012-03-08 21:56:13 +02:00
Shay Banon 5b76222ee7 Merge branch 'create-post-bug' of https://github.com/Paikan/elasticsearch 2012-03-01 14:54:19 +02:00
Shay Banon feaccee246 Multi level parent/child mapping and search fails, closes #1751. 2012-03-01 14:23:58 +02:00
Benjamin Devèze 7231ee832a set missing create param in PutRequest 2012-02-29 17:56:53 +01:00
Shay Banon c72772e621 msearch should accept a leading \n, closes #1736. 2012-02-27 00:27:43 +02:00
Shay Banon 9d724b8a14 fix test 2012-02-21 13:44:31 +02:00
Shay Banon 0bf61ab6c8 add pre/post zone, pre/post offset, and factor to date histogram builder 2012-02-21 12:43:28 +02:00
Shay Banon c6130b95e5 allow to provide no header (but still \n) for msearch 2012-02-20 22:00:43 +02:00
Shay Banon 4a9cb6408c API: Multi Search, closes #1722. 2012-02-20 18:57:27 +02:00
Benjamin Devèze 36a4cde89f add update integration tests 2012-02-17 23:09:52 +01:00
Shay Banon 7bd87e12a2 Indices query should accept alias names, closes #1698. 2012-02-17 15:03:52 +02:00
Shay Banon f997315f54 Date Mapping: Support "date math" when searching, closes #1708. 2012-02-16 18:10:12 +02:00
Shay Banon 278e5d3a43 Transport buffer overrun can happen because of byte buffer reading optimization introduced in 0.19.0.RC1, closes #1686. 2012-02-09 00:15:08 +02:00
Shay Banon 457f0a4266 Avoid placing a shard replica on the same machine as shard itself, closes #1680. 2012-02-08 15:39:01 +02:00
Shay Banon a5838dc403 improve test, wait for green state post master node startup 2012-02-01 21:17:45 +02:00
Shay Banon f6deb45970 Cluster Allocation: cluster.routing.allocation.allow_rebalance does not allow for rebalancing on relocating shard, closes #1651. 2012-01-30 01:58:51 +02:00
Shay Banon 70c334ec01 Index Allocation: allow to specify maximum total number of shards per node, closes #1650. 2012-01-30 01:43:18 +02:00
Shay Banon 49b6d70dfd Query DSL: prefix query to support _id, closes #1648. 2012-01-29 21:09:11 +02:00
Shay Banon bb6fb6e083 improve test to wait for 2 nodes 2012-01-28 00:26:53 +02:00
Shay Banon da433df217 Mapping: _source mapping to allow for format to convert to (if needed), closes #1639. 2012-01-26 00:18:46 +02:00
Shay Banon 68bb5d1434 by default, index metadata to be stored in smile format and store binary format mapping and alias filter to improve the cost it takes to persist them 2012-01-25 11:58:29 +02:00
Shay Banon c1a2a5c910 close the multicast socket in test 2012-01-24 13:10:38 +02:00
Shay Banon 1b7d329307 add a local gateway test to make sure we recover also latest state when updating index metadata and templates 2012-01-23 00:50:32 +02:00
Shay Banon 942b427940 Local Gateway: Store specific index metadata under dedicated index locations, closes #1631. 2012-01-22 23:34:34 +02:00
Shay Banon 534f487de3 Local Gateway: Move shard state to be stored under each shard, and not globally under _state, closes #1618. 2012-01-18 01:08:35 +02:00
Shay Banon 801c709b42 test with local gateway 2012-01-18 01:02:55 +02:00
Benjamin Devèze 0810808864 fix bug in TTL handling where default TTL value was not set properly 2012-01-17 10:35:16 +01:00
Shay Banon bddea09170 /_status doc count of index wrong, closes #1615. 2012-01-16 13:48:31 +02:00
Shay Banon 21405f5aa4 Highlighting: Add boundary_chars and boundary_max_size to control text boundaries with fast vector highlighter (term vector), closes #1614. 2012-01-15 23:05:34 +02:00
Shay Banon e37c0904f0 Add generic execution of APIs to Client (and indices/cluster) and allow for plugins to register custom APIs, closes #1612. 2012-01-15 16:15:09 +02:00
Shay Banon 8ee6ee05cd Java API: Move all request builders to org.elasticsearch.action... from org.elasticsearch.client.action, closes #1611. 2012-01-15 12:44:50 +02:00
Shay Banon d2d65f2f65 add test marker on the class as well 2012-01-12 16:59:43 +02:00
Olivier Favre 8f0ecbcc0b Improve latitude and longitude normalization 2012-01-12 16:58:44 +02:00
Shay Banon 04a138db5d Allow to provide timeout parameter in request body (as well as URI parameter), closes #1604. 2012-01-12 14:19:21 +02:00
Shay Banon 771dbdb4bc doc nested docs and get / uid 2012-01-11 15:01:40 +02:00
Shay Banon 5b2854e8bb Date Histogram Facet: Add `pre_offset` and `post_offset` options, closes #1599. 2012-01-09 21:28:56 +02:00
Shay Banon d149cbb06e query builder builds a "safe" byte array 2012-01-09 00:17:53 +02:00
Shay Banon 0f1b3f0457 delete by query to use byte reference serialization 2012-01-08 20:52:48 +02:00
Shay Banon 858195351b translog actions to use bytes ref serialization, and have the option to mark BytesStreamInput as unsafe 2012-01-08 17:23:37 +02:00
Shay Banon 45b5594e9b sleep before checking for no master block 2012-01-08 12:17:53 +02:00
Shay Banon e059e213db removed phonetic, fix test config files 2012-01-08 12:06:30 +02:00
Shay Banon 3d51553cf2 Move phonetic token filter to a plugin, closes #1594. 2012-01-07 23:18:30 +02:00
Shay Banon aec5af3800 clean more test yml files 2012-01-07 00:08:09 +02:00
Shay Banon 164df9979a remove yml file conf for test 2012-01-06 23:43:35 +02:00
Shay Banon 5c7d1d0984 remove yml file conf for test 2012-01-06 23:41:28 +02:00
Shay Banon ec8b7c3e23 No master (startup / minimum_master_node) / not recovered blocks should cause proper failures on operations, closes #1589. 2012-01-06 23:38:41 +02:00
Shay Banon a18021c778 Filter cache to have just weighted (node) and none, and index query parser cache to be size based, closes #1590. 2012-01-05 20:44:09 +02:00
Benjamin Devèze d95aa9f266 add ttl tests with routing 2012-01-04 23:37:34 +01:00
Shay Banon e5f2ce0fd6 use factor in scripts, so custom score function will work correctly when it multiplies 2012-01-04 21:53:26 +02:00
Shay Banon 761862a9a9 nicer exception names 2012-01-03 01:05:08 +02:00
Shay Banon 83d5084f62 Update API: Allow to update a document based on a script, closes #1583. 2012-01-02 22:02:19 +02:00
Shay Banon 8c6b2a3077 Date Histogram Facet: Improve time zone handling, add factor option, closes #1580. 2012-01-01 00:09:57 +02:00
Shay Banon 8cf8b478af Scan Search: Improve performance while scrolling through it, closes #1579. 2011-12-31 17:49:19 +02:00
Shay Banon e47ec96ca2 Merge branch 'master' of https://github.com/dakrone/elasticsearch 2011-12-29 14:17:50 +02:00
Lee Hinman f6b036f713 Refactor validate to validateQuery and move into indices admin action 2011-12-28 15:27:59 -07:00
Shay Banon 4e6217c54d simplify toString for cached filter 2011-12-28 23:35:04 +02:00
Lee Hinman be6e18cb36 Add query validation feature 2011-12-27 13:51:59 -07:00
Shay Banon 5049f60b6c Set an index / indices to read only, or make the cluster read only, closes #1573. 2011-12-27 20:35:07 +02:00
bbgordonn 661d04e9de #1452 closed: block writes or metadata changes if {index,cluster}.read_only is set. 2011-12-27 17:19:03 +02:00
Shay Banon cc3f44473f Search: Support partial fields that can returns partial view of the _source, closes #1570. 2011-12-26 16:49:55 +02:00
Shay Banon aa078788f9 Nested objects not deleted on "delete by query", closes #1537. 2011-12-25 13:33:02 +02:00
Shay Banon 73b74847aa cleanup test 2011-12-22 23:24:12 +02:00
jayson.minard 52e6327467 unit tests for issue 1560, customfiltersscore min and multiply search modes 2011-12-22 23:19:58 +02:00
Shay Banon 415ee6425a Allow search to continue when sort field is missing from type mapping, closes #1558. 2011-12-22 14:25:54 +02:00
Shay Banon fe4ba2ad55 Improve multi field mapper with highlighting based on source, closes #1559. 2011-12-22 02:24:36 +02:00
Shay Banon 52743a05fa rename setEncoder to setHighlighterEncoder, its not evident which encoder is refers to 2011-12-21 23:36:16 +02:00
Shay Banon 55d8d0d9c6 Analyze API: Allow to execute it without pre-creating an index, and allow to build custom analyzer (tokenizer + token_filters), closes #1555. 2011-12-21 23:24:55 +02:00
Shay Banon 2b838b808e add another path trie test for wildcard vs. contant 2011-12-20 17:42:06 +02:00
Shay Banon dd6c076454 simplify and improve scaling/blocking thread pools 2011-12-20 12:03:28 +02:00
Shay Banon 41b5c3d562 wait for yellow state in test 2011-12-19 13:51:33 +02:00
Shay Banon 0328a300eb move jmeter files under jmeter, no need for jmx 2011-12-18 01:01:12 +02:00
Shay Banon a3ca1afed5 Translog: When not sync'ing on each operation, buffer writes, closes #1549. 2011-12-18 00:19:35 +02:00
Shay Banon ec04435b06 rename test 2011-12-16 23:36:31 +02:00
Shay Banon 26fc9bcb25 abstract away the fs translog file to an interface 2011-12-16 23:30:31 +02:00
Shay Banon 922833cdc4 source not returned when * specified in fields list, closes #1541. 2011-12-14 21:18:14 +02:00
Shay Banon de861d6f43 Support Multicast discovery for external clients, closes #1532. 2011-12-11 18:54:07 +02:00
Shay Banon e7eed3c182 fix package location of trove extensions 2011-12-10 00:12:42 +02:00
Shay Banon a71f2eed99 fix test to wait for async indexing to finish 2011-12-09 10:04:11 +02:00
Shay Banon 1fd5a48409 wait for yellow status before searching 2011-12-08 16:28:57 +02:00
Shay Banon 5ea6c0bac5 wait for green status in test to make sure shards are allocated 2011-12-07 17:01:57 +02:00
Shay Banon 9781d8675d cleanups, remove unused code 2011-12-06 16:40:07 +02:00
Shay Banon 6a71eab51f finalize structure, tests pass 2011-12-06 02:43:17 +02:00
Shay Banon a8fd2d48b8 first cleanup phase, move to single src 2011-12-06 00:59:23 +02:00