Commit Graph

4785 Commits

Author SHA1 Message Date
Martijn van Groningen 637eeacb20 Better error description if field(s) (statistical facet) and value_field (term_stats facet) are not a numeric field 2013-04-10 11:11:52 +02:00
Martijn van Groningen 6a3c53ef44 Should prevent OOM 2013-04-10 10:00:51 +02:00
Martijn van Groningen b8b28041e5 Fix for extended facets test. 2013-04-10 00:47:00 +02:00
Shay Banon 80fc55a01d upgrade to netty 3.6.5 2013-04-09 09:44:49 -07:00
Igor Motov b0e44a2b40 Fix term counters in script field terms facet
Fixes #2878
2013-04-09 12:42:35 -04:00
Simon Willnauer ae74a8dbb7 Configure FieldData using a hash not a string
Closes #2876
2013-04-09 15:53:05 +02:00
Simon Willnauer 374bbbfa7b # FieldData Filter
FieldData is an in-memory representation of the term dictionary in an uninverted form. Under certain circumstances this FieldData representation can grow very large on high-cardinality fields like tokenized full-text. Depending on the use-case filtering the terms that are hold in the FieldData representation can heavily improve execution performance and application stability.
FieldData Filters can be applied on a per-segment basis. During FieldData loading the terms enumeration is passed through a filter predicate that  either accepts or rejects a term.

## Frequency Filter

The Frequency Filter acts as a high / low pass filter based on the document frequencies of a certain term within the segment that is loaded into field data. It allows to reject terms that are very high or low frequent based on absolute frequencies or percentages relative to the number of documents in the segment or more precise the number of document that have at least one value in the field that is loaded in the current segment.

Here is an example mapping

Here is an example mapping:

```json
{
    "tweet" : {
        "properties" : {
            "locale" : {
                "type" : "string",
                "fielddata" : "format=paged_bytes;filter.frequency.min=0.001;filter.frequency.max=0.1",
                "index" : "analyzed",
            }
        }
    }
}
```
### Paramters

 * `filter.frequency.min` - the minimum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
 * `filter.frequency.max` - the maximum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
 * `filter.frequency.min_segment_size` - the minimum number of documents in a segment in order for the filter to be applied. Small segments might be omitted with this setting.

## Regular Expression Filter

The regular expression filter applies a regular expression to each term  during loading and only loads terms into memory that match the given regular expression.

Here is an example mapping:

```json
{
    "tweet" : {
        "properties" : {
            "locale" : {
                "type" : "string",
                "fielddata" : "format=paged_bytes;filter.regex=^en_.*",
                "index" : "analyzed",
            }
        }
    }
}
```

Closes #2874
2013-04-09 11:34:48 +02:00
Igor Motov acc0950957 Get template should return warmers
Fixes #2868
2013-04-08 19:12:20 -04:00
Simon Willnauer a10c80e20f ensure that modificatons to the enum order trigger test failures since we rely on the ordinal 2013-04-08 23:29:56 +02:00
Simon Willnauer 7e77ddb88f use enum to represent flags and fail if flags are not respected 2013-04-08 22:56:11 +02:00
Igor Motov 2a588dc1f1 Fix IndexMissingException in get template request
Fixes #2873
2013-04-08 16:25:09 -04:00
Shay Banon 3120457bfe move to 0.90.0.RC3 snap 2013-04-08 05:48:29 -07:00
Shay Banon 3a8cba4d50 release 0.90.0.RC2 2013-04-08 05:46:26 -07:00
Shay Banon 5fa66cd592 Node Stats: Allow to explicitly get specific indices level node stats element
closes #2871
2013-04-07 20:22:48 -07:00
Shay Banon 15d7ae5983 FieldData Stats: Add field data stats to indices stats API
closes #2870
2013-04-07 18:30:24 -07:00
Martijn van Groningen 86c1714bf3 Also test the `fields` option. 2013-04-07 21:52:19 +02:00
Simon Willnauer 7ad03ed789 Use IndexOption.DOCS_ONLY for boolean fields
Closes #2866
2013-04-06 22:41:22 +02:00
Shay Banon 9f6c8c88f3 improve on shard level filter/id cache stats
use just the removal listener and back to the IndexReader#coreCacheKey as the actual field as part of the cache key
2013-04-06 00:02:42 +02:00
Shay Banon 815917fbf8 confusing code..., but we can't release the searcher in a get result case
we need that searcher later on..., need to think of how to simplify that..., added a comment for now
2013-04-05 23:27:03 +02:00
Simon Willnauer 36ffd6d582 release searcher in finally block rather than relying on an exception that is thrown 2013-04-05 22:45:52 +02:00
Shay Banon 84670212a6 Filter / Id Cache Stats: Add to Indices Stats API, revise node stats API
closes #2862
2013-04-05 20:02:32 +02:00
Simon Willnauer 5e7ad9832c Added more evil tests for different field data implementations 2013-04-05 18:12:50 +02:00
Martijn van Groningen 224faffead Added an extended test for terms facet with a decent number of documents / field values and randomly tests various options. Also fixed an issue where `regex` and `excludes` were ignored when `all_terms` was used. 2013-04-05 17:38:46 +02:00
David Pilato 4b1ec037f8 Fix test for #2668. 2013-04-05 15:00:28 +02:00
Martijn van Groningen 9b5c74d43e Made sure `all_terms` works consistently. In some cases the `all_terms` option was ignored: * Faceting on number based fields. * The `execution_type` was set to `map`. * In the case the `fields` option was used.
Closes #2861
2013-04-05 14:27:19 +02:00
Shay Banon 831ea789aa rename getByOrd to getValueByOrd (to match BytesValues.WithOrdinals)
also make it public so it can be used when iterating over ords
2013-04-05 13:56:33 +02:00
Shay Banon bcc14cde9f make numeric namings consistent with bytes ones
also add the ability to get the ordinals from DoubleValues.WithOrdinals and LongValues.WithOrdinals
2013-04-05 13:33:56 +02:00
David Pilato 36b92be212 List of existing plugins with Node Info API
We want to display information about loaded plugins in Node Info API using plugin option:

```sh
curl http://localhost:9200/_nodes?plugin=true
```

For example, on a 4 nodes cluster, it could provide the following output:

```javascript
{
  "ok" : true,
  "cluster_name" : "test-cluster-MacBook-Air-de-David.local",
  "nodes" : {
    "lodYfbFTRnmwE6rjWGGyQQ" : {
      "name" : "node1",
      "transport_address" : "inet[/172.18.58.139:9300]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9200]",
      "plugins" : [ ]
    },
    "hJLXmY_NTrCytiIMbX4_1g" : {
      "name" : "node4",
      "transport_address" : "inet[/172.18.58.139:9303]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9203]",
      "plugins" : [ {
        "name" : "test-plugin",
        "description" : "test-plugin description",
        "site" : true,
        "jvm" : false
      }, {
        "name" : "test-no-version-plugin",
        "description" : "test-no-version-plugin description",
        "site" : true,
        "jvm" : false
      }, {
        "name" : "dummy",
        "description" : "No description found for dummy.",
        "url" : "/_plugin/dummy/",
        "site" : false,
        "jvm" : true
      } ]
    },
    "bnoySsBfTrSzbDRZ0BFHvg" : {
      "name" : "node2",
      "transport_address" : "inet[/172.18.58.139:9301]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9201]",
      "plugins" : [ {
        "name" : "dummy",
        "description" : "This is a description for a dummy test site plugin.",
        "url" : "/_plugin/dummy/",
        "site" : false,
        "jvm" : true
      } ]
    },
    "0Vwil01LSfK9YgRrMce3Ug" : {
      "name" : "node3",
      "transport_address" : "inet[/172.18.58.139:9302]",
      "hostname" : "MacBook-Air-de-David.local",
      "version" : "0.90.0.Beta2-SNAPSHOT",
      "http_address" : "inet[/172.18.58.139:9202]",
      "plugins" : [ {
        "name" : "test-plugin",
        "description" : "test-plugin description",
        "site" : true,
        "jvm" : false
      } ]
    }
  }
}
```

Information are cached for 10 seconds by default. Modify `plugins.info_refresh_interval` property if needed.
Setting `plugins.info_refresh_interval` to `-1` will cause infinite caching.
Setting `plugins.info_refresh_interval` to `0` will disable caching.

Closes #2668.
2013-04-05 11:36:56 +02:00
Simon Willnauer f3e6fe094a beef up term facet tests 2013-04-05 11:05:24 +02:00
Simon Willnauer 9fbe075aec Added test that compares concurrent facet execution results with a serial execution result 2013-04-05 10:36:53 +02:00
Shay Banon 5af6343697 allow to disable the optimization of removal of ords on single value numerics/geo field data
field data settings in the mappings can have ordinals=always option
2013-04-05 00:44:07 +02:00
Shay Banon 54f685674b Thread Pool: Update default settings (move from default cached to fixed)
closes #2858
2013-04-04 23:24:49 +02:00
Simon Willnauer f1dd867c4f Catch Throwable when listener is called rather then Exception to prevent possible hangs if fatal exceptions or errors are thrown 2013-04-04 22:58:38 +02:00
Shay Banon a206aa4548 Settings / Config: Allow to explicitly specify external environment variable syntax, in which case its optional
fixes #2855
2013-04-04 16:30:24 +02:00
Simon Willnauer d758401add Cleanup ScriptDocValues. This commit adds a getValues method to all ScriptDocValues for easy access
in scripts via doc['field'].values / value.
2013-04-04 16:07:54 +02:00
Alexander Reelsen 4f96b36376 Returning configuration of root field mappers toXContent method only if they are enabled 2013-04-04 15:55:12 +02:00
Alexander Reelsen fbdf89c636 Fix for ttl fieldmapper to support disabling correctly. Also returning only booleans, not enums in toXContent 2013-04-04 12:27:23 +02:00
Alexander Reelsen 230cbd3448 Merge branch 'field-mappers-fix' into master
This fixes #2136 and allows to disable the timestamp, index and size field mappers on runtime.
2013-04-04 10:00:15 +02:00
Alexander Reelsen 955788e9a5 Allowing to disable size field mapper after enabling 2013-04-04 09:41:41 +02:00
Alexander Reelsen e662e4d55d Allowing to disable index field mapper after enabling 2013-04-04 09:41:41 +02:00
Alexander Reelsen 9cc2563d5e Allowing to disable timestamp field mapper after enabling 2013-04-04 09:41:41 +02:00
Simon Willnauer 223ec2c42d Beef up FieldData tests by running one on one duells 2013-04-03 18:38:25 +02:00
Igor Motov 356329df00 Improve stability of ClusterHealthTests 2013-04-03 12:07:42 -04:00
Igor Motov d2f6349dcf Improve stability of MinimumMasterNodesTests 2013-04-03 11:51:28 -04:00
Martijn van Groningen 0a89c80554 Fixed issue where a doc is omitted from the hits if it has no geo point and sorting is based on geo distance.
Closes #2851
2013-04-03 17:25:16 +02:00
Martijn van Groningen f7d68e8252 Added `clean` to mvn command. In some cases when recompiling not all changes are detected (E.g. file move, already compiled classes don't get this change). 2013-04-03 15:01:39 +02:00
Simon Willnauer bbe619a416 Call onFailure for every exception case even in the case of an error / runtime exception
Closes #2848
2013-04-03 12:25:58 +02:00
Simon Willnauer eb8b38d027 Upgrade to Lucene 4.2.1 2013-04-03 12:22:39 +02:00
Martijn van Groningen af2f31c33e Fixed typo 2013-04-02 22:31:06 +02:00
Martijn van Groningen cf00acf5b0 If no specified index or alias exists and `ignore_indices` is set to `missing` an index missing error is returned instead of resolving to all open indices (e.g. when searching). This breaks backwards comp. with 0.20.x and before.
Closes #2837
2013-04-02 19:06:17 +02:00