Commit Graph

4692 Commits

Author SHA1 Message Date
uboness 6f73c93692 Added an option to add arbitrary headers to the client requests
The headers are key/value pairs defined in the settings under the `request.headers` namespace.
2014-08-06 03:33:08 +02:00
mikemccand 06709faff2 Fix false assert trip 2014-08-05 15:47:21 -04:00
Shay Banon f216dc4ab8 [TEST] make sure all shards have docs
we need that in order for refresh to be effective and actually refresh in the second round of indexing, otherwise, it caches a 0 docs shard and a refresh won't expire anything there
2014-08-05 20:28:47 +02:00
Shay Banon e6e2781ee7 [Query Cache] Add a request level flag to control query cache
A request level flag, defaults to be unset, to control the query cache. When not set, it defaults to the index level settings, when explicitly set, will override the index level setting
closes #7167
2014-08-05 18:28:49 +02:00
markharwood e6b459cb9f Update API enhancement - add support for scripted upserts.
In the case of inserts the UpdateHelper class will now allow the script used to apply updates to run on the upsert doc provided by clients. This allows the logic for managing the internal state of the data item to be managed by the script and is not reliant on clients performing the initialisation of data structures managed by the script.

Closes #7143
2014-08-05 16:52:44 +01:00
Shay Banon 418ce50ec4 Query Cache: Support shard level query response caching
The query cache allow to cache the (binary serialized) response of the shard level query phase execution based on the actual request as the key. The cache is fully coherent with the semantics of NRT, with a refresh (that actually ended up refreshing) causing previous cached entries on the relevant shard to be invalidated and eventually evicted.

This change enables query caching as an opt in index level setting, called `index.cache.query.enable` and defaults to `false`. The setting can be changed dynamically on an index. The cache is only enabled for search requests with search_type count.

The indices query cache is a node level query cache. The `indices.cache.query.size` controls what is the size (bytes wise) the cache will take, and defaults to `1%` of the heap. Note, this cache is very effective with small values in it already. There is also the advanced option to set `indices.cache.query.expire` that allow to control after a certain time of inaccessibility the cache will be evicted.

Note, the request takes the search "body" as is (bytes), and uses it as the key. This means same JSON but with different key order will constitute different cache entries.

This change includes basic stats (shard level, index/indices level, and node level) for the query cache, showing how much is used and eviction rates.

While this is a good first step, and the goal is to get it in, there are a few things that would be great additions to this work, but they can be done as additional pull requests:

- More stats, specifically cache hit and cache miss, per shard.
- Request level flag, defaults to "not set" (inheriting what the setting is).
- Allowing to change the cache size using the cluster update settings API
- Consider enabling the cache to query phase also when asking hits are involved, note, this will only include the "top docs", not the actual hits.
- See if there is a performant manner to solve the "out of order" of keys in the JSON case.
- Maybe introduce a filter element, that is outside of the request, that is checked, and if it matches all docs in a shard, will not be used as part of the key. This will help with time based indices and moving windows for shards that fall "inside" the window to be more effective caching wise.
- Add a more infra level support in search context that allows for any element to mark the search as non deterministic (on top of the support for "now"), and use it to not cache search responses.

closes #7161
2014-08-05 17:45:42 +02:00
Alexander Reelsen 35e67c84fa CORS: Allowed to configure allow-credentials header to work via SSL
This adds support to return the "Access-Control-Allow-Credentials" header
if needed, so CORS will work flawlessly with authenticated applications.

Closes #6380
2014-08-05 17:33:06 +02:00
Shay Banon 1d01b2ac6a [TEST] increase ack timeout on large cluster
when we have a small machine running it with randmoized larger number of nodes (5), we need more time to process it
2014-08-05 16:52:46 +02:00
Lee Hinman 8124bcae1e Make "cluster.routing.allocation.allow_rebalance" a dynamic setting
Also makes it a static constant and changes all tests to use it instead
of a string.

Fixes #7092
2014-08-05 16:26:09 +02:00
Shay Banon 9d79848998 [TEST] add explicit options to no master tests 2014-08-05 13:27:19 +02:00
javanna c788a5e67b [TEST} make sure unicast bw comp test uses specifically set transport port and properly configure unicast hosts
Added also assertBusy block needed since we test the local cluster state on each node, not only on the master node.
2014-08-05 11:47:10 +02:00
Colin Goodheart-Smithe 9c89fcf5a2 Aggregations: key_as_string only shown when format specified in terms agg
The key_as_string field is now not shown in the terms aggregation for long and double fields unless the format parameter is specified

Closes #7125
2014-08-05 10:38:59 +01:00
uboness 0da5cecc3c Added custom transport client settings to test infra
It's now possible to define the additional customesettings for transport clients by extending `transportClientSettings` callback method on `ElasticsearchIntegrationTest`.
2014-08-04 23:58:30 +02:00
javanna 74aa35bdcd [TEST] Fixed GetTermVectorTests, added missing break statements in randomization switch 2014-08-04 17:16:10 +02:00
javanna 24a1a0f07f [TEST] Fixed action names bw comp tests, action not found is expected for newly added exists actions 2014-08-04 17:00:40 +02:00
David Pilato 873a45eaba Search: add time zone setting for relative date math in range filter/query
Filters and Queries now supports `time_zone` parameter which defines which time zone should be applied to the query or filter to convert it to UTC time based value.

When applied on `date` fields the `range` filter and queries accept also a `time_zone` parameter.

The `time_zone` parameter will be applied to your input lower and upper bounds and will move them to UTC time based date:

[source,js]
--------------------------------------------------
{
    "constant_score": {
        "filter": {
            "range" : {
                "born" : {
                    "gte": "2012-01-01",
                    "lte": "now",
                    "time_zone": "+1:00"
                }
            }
        }
    }
}

{
    "range" : {
        "born" : {
            "gte": "2012-01-01",
            "lte": "now",
            "time_zone": "+1:00"
        }
    }
}
--------------------------------------------------

In the above examples, `gte` will be actually moved to `2011-12-31T23:00:00` UTC date.

NOTE: if you give a date with a timezone explicitly defined and use the `time_zone` parameter, `time_zone` will be
ignored. For example, setting `from` to `2012-01-01T00:00:00+01:00` with `"time_zone":"+10:00"` will still use `+01:00` time zone.

Closes #3729.
2014-08-04 15:42:03 +02:00
javanna d2fea5378a Transport: better categorization for transport actions
Our transport relies on action names that tell what we need to do with each message received and sent on any node, together with the content of the request itself.
The action names could use a better categorization and more consistent naming though, the following are the categories introduced with this commit:

- indices: for all the apis that execute against indices
  - admin: for the apis that allow to perform administration tasks against indices
  - data: for the apis that are about data
    - read: apis that read data
    - write: apis that write data
    - benchmark: apis that run benchmarks

- cluster: for all the cluster apis
  - admin: for the cluster apis that allow to perform administration tasks
  - monitor: for the cluster apis that allow to monitor the system

- internal: for all the internal actions that are used from node to node but not directly exposed to users

The change is applied in a backwards compatible manner: we keep the mapping old-to-new action name around, and when receiving a message, depending on the version of the node we receive it from, we use the received action name or we convert it to the previous version (old to new if version < 1.4). When sending a message, depending on the version of the node we talk to, we use the updated action or we convert it to the previous version (new to old if version < 1.4).
For the cases where we don't know the version of the node we talk to, namely unicast ping, transport client nodes info and transport client sniff mode (which calls cluster state), we just use a lower bound for the version, thus we will always use the old action name, which can be understood by both old nodes and new nodes.

Added test that enforces known updated categories for transport action names and test that verifies all action names have a pre 1.4 version for bw compatibility

Added backwards compatibility tests for unicast and transport client in sniff mode, the one for the ordinary transport client (which calls nodes info) is implicit as it's used all the time in our bw comp tests.
Added also backwards comp test that sends an empty message to any of the registered transport handler exposed by older nodes and verifies that what gets back is not ActionNotFoundTransportException, which would mean that there is a problem in the actions mappings.

Added TestCluster#getClusterName abstract method and allow to retrieve externalTransportAddress and internalCluster from CompositeTestCluster.

Closes #7105
2014-08-04 15:24:16 +02:00
mikemccand a58d9a1dd0 Core: simultaneous create/delete against same id can cause silently inconsistent replica
If simultaneous create & delete operations arrive against the same id,
it's possible that primary and replica see those operations in
different orders, which may result in replica throwing
DocumentAlreadyExistsException when the primary didn't which would
lead to replica being inconsistent (missing a document that primary
had indexed).

This push fixes the issue, by never throwing DAEE from the replica on
create.

Closes #7146 #7142
2014-08-04 09:14:09 -04:00
javanna 8989d062cd MultiGet & MultiTermVector api: fail when using no routing and an alias to an index that has routing required (for that doc type)
Made sure that the routing required check is performed against the concrete index, added use of aliases to existing routing tests.

Taken the change to unify the failure message as well to this form: routing is required for [" + index + "]/[" + type + "]/[" + id + "]

Closes #7145
2014-08-04 14:19:19 +02:00
Shay Banon 5795e4fbd7 [TEST] better failure message when back location is missing 2014-08-04 12:50:37 +02:00
Martijn van Groningen c8cc59df57 Percolator should cache index field data instances.
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes #6806
Closes #7081
2014-08-04 10:23:34 +02:00
Britta Weber 5706858722 Add parameter to GET for checking if generated fields can be retrieved
Fields of type `token_count`, `murmur3`, `_all` and `_field_names` are generated only when indexing.
If a GET requests accesses the transaction log (because no refresh
between indexing and GET request) then these fields cannot be retrieved at all.
Before the behavior was so:

`_all, _field_names`: The field was siletly ignored
`murmur3, token_count`: `NumberFormatException` because GET tried to parse the values from the source.

In addition, if these fields were not stored, the same behavior occured if the fields were
retrieved with GET after a `refresh()` because here also the source was used to get the fields.

Now, GET accepts a parameter `ignore_errors_on_generated_fields` which has
the following effect:
- Throw exception with meaningful error message explaining the problem if set to false (default)
- Ignore the field if set to true
- Always ignore the field if it was not set to stored

This changes the behavior for `_all` and `_field_names` as now an Exception is thrown if a user
tries to GET them before a `refresh()`.

closes #6676
closes #6973
2014-08-04 08:15:34 +02:00
Britta Weber a3cefd919e significant terms: add google normalized distance, add chi square
closes #6858
2014-08-04 08:15:26 +02:00
uboness b667bcdedf fixed platform independent line separator in CliToolTests 2014-08-02 21:02:22 +02:00
Shay Banon 95762e8126 Support "default" for tcpNoDelay and tcpKeepAlive
Allow to set the value default to network.tcp.no_delay and network.tcp.keep_alive so they won't be set at all, since on solaris, setting tcpNoDelay can actually cause failure
relates to #7115
2014-08-02 17:32:41 +02:00
uboness 5ccc7beaf4 Added a cli infrastructure
CliTool is a base class for command-line interface tools (such as the plugin manager and potentially others). It supports the following:
  - single or multi command tool
  - help printing infrastructure (based on help files)
  - consistent mechanism of parsing arguments (based on commons-cli lib)
  - separation of argument parsing and command execution (for easier unit testing)
  - terminal abstraction (will use System.console() when available)
2014-08-02 17:16:27 +02:00
Shay Banon 2d31349ab0 Fix missing break statement causing reroute serialization failure
closes #7135
2014-08-02 16:52:52 +02:00
Areek Zillur b81b240924 [Fix] CompletionMapper throws misleading error on null value
closes #6399
2014-08-01 15:25:05 -04:00
uboness 3c9c9f33e2 Aggregations Added Filters aggregation
A multi-bucket aggregation where multiple filters can be defined (each filter defines a bucket). The buckets will collect all the documents that match their associated filter.

This aggregation can be very useful when one wants to compare analytics between different criterias. It can also be accomplished using multiple definitions of the single filter aggregation, but here, the user will only need to define the sub-aggregations only once.

Closes #6118
2014-08-01 16:01:08 +01:00
Adrien Grand d9d5b35be9 Sort: Make `ignore_unmapped` work for cross-index queries.
Close #2255
2014-08-01 15:30:17 +02:00
Lee Hinman db7b6097cc [TEST] check breaker reset after parent trip instead of trip count 2014-08-01 15:18:38 +02:00
javanna d5b6de3295 Removed support for aliases as part of index settings
Now that we have explicit support for aliases when creating indices and as part of index templates, we may remove support for aliases (only names) as part of index settings. This is partially breaking as the following calls:

curl -XPUT localhost:9200/index -d '{
  "settings" : {
    "aliases" : [ "alias1"]
  }
}

and

curl -XPUT localhost:9200/index -d '{
  "settings" : {
    "index.aliases" : [ "alias1"]
  }
}

were previously supported and will need to be replaced with

curl -XPUT localhost:9200/index -d '{
  "aliases" : {
    "alias1": {}
  }
}

Closes #5545
2014-08-01 13:49:43 +02:00
Skye Book 0040ed4f6f Function score query: Add missing whitespace when throwing exception
DecayFunctionParser throws a parse exception with a string containing "scaleand origin", this fixes the spacing issue.
2014-08-01 12:43:26 +02:00
Adrien Grand a9d5c03924 Aggregations: Fix infinite loop in the histogram reduce logic.
The histogram reduce method can run into an infinite loop if the
Rounding.nextRoundingValue value is buggy, which happened to be the case for
DayTimeZoneRoundingFloor.

DayTimeZoneRoundingFloor is fixed, and the histogram reduce method has been
changed to fail instead of running into an infinite loop in case of a buffy
nextRoundingValue impl.

Close #6965
2014-08-01 08:58:10 +02:00
Adrien Grand 99b32901d2 Mappings: Fix `copy_to` behavior on nested documents.
Today, `copy_to` always copies a field to the current document, which is often
wrong in the case of nested documents. For example, if you have a nested field
called `n` which has a sub-field `n.source` whose content should be copied to
`target`, then the latter field should be created in the root document instead
of the nested one, since it doesn't have `n.` as a prefix. On the contrary, if
you configure the destination field to be `n.target`, then it should go to the
nested document.

Close #6701
2014-08-01 08:57:33 +02:00
Areek Zillur 1d581e6286 Search Exists API: Checks if any matching documents exist for a given query
Implements a new Exists API allowing users to do fast exists check on any matched documents for a given query.
This API should be faster then using the Count API as it will:
 - early terminate the search execution once any document is found to exist
 - return the response as soon as the first shard reports matched documents

closes #6995
2014-07-31 15:42:30 -04:00
David Pilato 85eb0ea0e7 Generate timestamp when path is null
Index process fails when having `_timestamp` enabled and `path` option is set.
It fails with a `TimestampParsingException[failed to parse timestamp [null]]` message.

Reproduction:

```
DELETE test
PUT  test
{
    "mappings": {
        "test": {
            "_timestamp" : {
                "enabled" : "yes",
                "path" : "post_date"
            }
        }
    }
}
PUT test/test/1
{
  "foo": "bar"
}
```

You can define a default value for when timestamp is not provided
within the index request or in the `_source` document.

By default, the default value is `now` which means the date the document was processed by the indexing chain.

You can disable that default value by setting `default` to `null`. It means that `timestamp` is mandatory:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "default" : null
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

You can also set the default value to any date respecting timestamp format:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "format" : "YYYY-MM-dd",
            "default" : "1970-01-01"
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

Closes #4718.
Closes #7036.
2014-07-31 19:48:22 +02:00
Brian Murphy 6b39aa615e [TEST] Reduce number of updates and wait after 1st NoNodeException. 2014-07-31 18:01:30 +01:00
Britta Weber fe86c8bc88 _geo_distance sort: allow many to many geo point distance
Add computation of disyance to many geo points. Example request:

```
{
  "sort": [
    {
      "_geo_distance": {
        "location": [
          {
            "lat":1.2,
            "lon":3
          },
          {
             "lat":1.2,
            "lon":3
          }
        ],
        "order": "desc",
        "unit": "km",
        "sort_mode": "max"
      }
    }
  ]
}
```

closes #3926
2014-07-31 17:33:45 +02:00
Shay Banon 739e977aa7 cluster block with auto create index bulk action can cause bulk execution to not return
when there is a cluster block (like no master yet discovered), the bulk action doesn't properly catch the exception of inner execute to notify the listener, causing the bulk operation to hang
closes #7086
2014-07-31 16:57:49 +02:00
Martijn van Groningen 5ea6267883 [TEST] All shards should be allocated before snapshotting & restoring 2014-07-31 16:22:24 +02:00
Martijn van Groningen 1c59ae1b99 [TEST] Increased logging and make use of prepareCreate and assertAcked 2014-07-31 16:22:24 +02:00
Shay Banon 521f8b28b5 [TEST] Concurrent percolation more randomized + safe use of rand 2014-07-31 11:56:36 +02:00
Alex Ksikes e3b3b6c055 Term Vectors API: adds support for wildcards in selected fields
This could useful to generate all term vectors or a chosen set of them.

Closes #7061
2014-07-30 17:44:37 +02:00
Alexander Reelsen 2077d4be48 Packaging: Dont remove ancestors on deb removal
The used -p option could result in accidentally deleting more directories
than /var/lib/elasticsearch - so this option was removed

Note: This only happens if the directories are empty, but still isnt needed.

Relates #5770
2014-07-30 15:01:06 +02:00
Lee Hinman f9f8459c79 [TEST] randomize byte size and thread count in breaker tests 2014-07-30 12:02:16 +02:00
mikemccand dbcc4e9255 Test: finish all merges before test returns to side step 'Delete Index failed - not acked' failure 2014-07-29 11:54:45 -04:00
Simon Willnauer 873b491f4e [TEST] Make default flush opertion blocking 2014-07-29 13:48:50 +02:00
Simon Willnauer af5b2ae28a [TEST] wait for yellow after index was created in BWC tests 2014-07-29 13:48:42 +02:00
Adrien Grand d67e013e08 [TESTS] Don't create an unclosed threadpool in BigArraysTests. 2014-07-29 13:26:53 +02:00
Lee Hinman 5aa2d0cf61 Add support for the `_name` parameter to the simple_query_string query 2014-07-29 12:41:41 +02:00
Lee Hinman 36cf595367 [TESTS] spin in a loop checking request breaker, because multiple clusters could be running 2014-07-29 11:26:34 +02:00
javanna 91c4824a0f Transport client: don't add listed nodes to connected nodes list in sniff mode
This commit effectively reverts e1aa91d , as it is not needed anymore to add the original listed nodes. The cluster state local call made will in fact always return at least the local node (see #6811).

There were a couple of downsides caused by putting the original listed nodes among the connected nodes:
1) in the following retries, they weren't seen as listed nodes anymore, thus the light connect wasn't used
2) among the connected nodes some were "bad" duplicates as they are already there and don't contain all needed info for each node. This was causing serialization problems for instance given that the node version was missing on the `DiscoveryNode` object.

Closes #7067
2014-07-28 21:48:03 +02:00
javanna fcf4d5a38d Transport Client: fixed the node retry mechanism which could fail without trying all the connected nodes
The RetryListener was notified twice for each single failure, which caused some additional retries, but more importantly was making the client reach the maximum number of retries (number of connected nodes) too quickly, meanwhile ongoing retries which could succeed were not completed yet.

The TransportService used to throw ConnectTransportException due to throwConnectException set to true, and also notify the listener of any exception received from a separate thread through the request holder.

Simplified exception handling by just removing the throwConnectException option from the TransportService, used only in the transport client. The transport client now relies solely on the request holder to notify of failures and eventually retry.

Closes #6829
2014-07-28 20:34:46 +02:00
Simon Willnauer eecbf8a559 Add [1.3.2] version constant 2014-07-28 17:22:18 +02:00
Lee Hinman a93ee599d3 [TESTS] fix circuit breaker tests for remote clusters and bwc
Adds additional version checks in NodeStats for older versions

When using an external cluster (backwards compatibility tests), the act
of checking the request breaker requires a network buffer, which
increments the breaker. This change only checks the request breaker in
InternalTestCluster and uses Guice to retrieve it instead of
a (possible) network request.

Also removed the now unused InternalCircuitBreakerService class
2014-07-28 17:18:24 +02:00
javanna 4e5ad568bb Rest: fixed filters execution order to be from lowest to highest rather than the other way around
Closes #7019
2014-07-28 16:54:42 +02:00
javanna 0e9594e02d Internal: use AtomicInteger instead of volatile int for the current action filter position
Also improved filter chain tests to not rely on execution time, and made filter chain tests look more similar to what happens in reality by removing multiple threads creation in testTooManyContinueProcessing (something we don't support anyway, makes little sense to test it).

Closes #7021
2014-07-28 16:54:42 +02:00
David Pilato 264d59c3e2 Plugin Lucene version checker: use `Lucene.parseVersionLenient`
With commit 07c632a2d4dbefe44e8f25dc4ded6cf143d60e41, we now have a new Lucene.parseVersionLenient(String, Version) method which tries to find an existing Lucene version based on the two first digits X.Y of X.Y.Z String.
2014-07-28 16:38:44 +02:00
Colin Goodheart-Smithe 162200f6ed Aggregations: Stops direct subclassing of InternalNumericMetricsAggregation
Must subclass either InternalNumericMetricsAggregation.SingleValue or InternalNumericMetricsAggregation.MultiValue
2014-07-28 14:13:23 +01:00
Itamar Syn-Hershko dd0b42838d [QUERY] Separate parsing impl from setter in SearchParseElement
This commit makes it easier to reuse the inner highlighting, fetch
and rescore parsing logic by plugins or other internal parts.

Closes #3602
2014-07-28 14:53:04 +02:00
Simon Willnauer d403e68f43 add missing import 2014-07-28 14:33:51 +02:00
Simon Willnauer bf7f97d22f [CORE] Support alpha/beta releases in version parsing too
Pull Request #7055 fixed Version parsing for bugfix releases
causing problems with minor version in segments files. Even though
we never release anything with lucene in alpha / beta status this
commit fixes lenient parsing for these cases.

Relates to #7055
2014-07-28 14:04:39 +02:00
Simon Willnauer d2493ea48a [CORE] Support parsing lucene minor version strings
We parse the version that is shipped with the Lucene segments in order
to find the version of lucene that wrote a particular segment. Yet, some lucene
version ie:
 * 4.3.1 (Elasticsearch 0.90.2)
 * 4.5.1 (Elasticsearch 0.90.7)
 * 3.6.1 (pre Elasticsearch 0.90.0)

wrote illegal strings containing the minor version which causes IAE exceptions
being thrown from lucenes parsing method.

Closes #7055
2014-07-28 13:02:00 +02:00
Lee Hinman 07c9b5b08d Change logging level for circuit breaking to warn 2014-07-28 12:10:13 +02:00
Lee Hinman 6abe4c951d Add HierarchyCircuitBreakerService
Adds a breaker for request BigArrays, which are used for parent/child
queries as well as some aggregations. Certain operations like Netty HTTP
responses and transport responses increment the breaker, but will not
trip.

This also changes the output of the nodes' stats endpoint to show the
parent breaker as well as the fielddata and request breakers.

There are a number of new settings for breakers now:

`indices.breaker.total.limit`: starting limit for all memory-use breaker,
defaults to 70%

`indices.breaker.fielddata.limit`: starting limit for fielddata breaker,
defaults to 60%
`indices.breaker.fielddata.overhead`: overhead for fielddata breaker
estimations, defaults to 1.03

(the fielddata breaker settings also use the backwards-compatible
setting `indices.fielddata.breaker.limit` and
`indices.fielddata.breaker.overhead`)

`indices.breaker.request.limit`: starting limit for request breaker,
defaults to 40%
`indices.breaker.request.overhead`: request breaker estimation overhead,
defaults to 1.0

The breaker service infrastructure is now generic and opens the path to
adding additional circuit breakers in the future.

Fixes #6129

Conflicts:
	src/main/java/org/elasticsearch/index/fielddata/IndexFieldData.java
	src/main/java/org/elasticsearch/index/fielddata/IndexFieldDataService.java
	src/main/java/org/elasticsearch/index/fielddata/RamAccountingTermsEnum.java
	src/main/java/org/elasticsearch/index/fielddata/ordinals/GlobalOrdinalsBuilder.java
	src/main/java/org/elasticsearch/index/fielddata/ordinals/InternalGlobalOrdinalsBuilder.java
	src/main/java/org/elasticsearch/index/fielddata/plain/AbstractIndexOrdinalsFieldData.java
	src/main/java/org/elasticsearch/index/fielddata/plain/DisabledIndexFieldData.java
	src/main/java/org/elasticsearch/index/fielddata/plain/IndexIndexFieldData.java
	src/main/java/org/elasticsearch/index/fielddata/plain/NonEstimatingEstimator.java
	src/main/java/org/elasticsearch/index/fielddata/plain/PackedArrayIndexFieldData.java
	src/main/java/org/elasticsearch/index/fielddata/plain/ParentChildIndexFieldData.java
	src/main/java/org/elasticsearch/index/fielddata/plain/SortedSetDVOrdinalsIndexFieldData.java
	src/main/java/org/elasticsearch/node/internal/InternalNode.java
	src/test/java/org/elasticsearch/index/aliases/IndexAliasesServiceTests.java
	src/test/java/org/elasticsearch/index/codec/CodecTests.java
	src/test/java/org/elasticsearch/index/fielddata/AbstractFieldDataTests.java
	src/test/java/org/elasticsearch/index/fielddata/IndexFieldDataServiceTests.java
	src/test/java/org/elasticsearch/index/mapper/MapperTestUtils.java
	src/test/java/org/elasticsearch/index/query/IndexQueryParserFilterCachingTests.java
	src/test/java/org/elasticsearch/index/query/SimpleIndexQueryParserTests.java
	src/test/java/org/elasticsearch/index/query/guice/IndexQueryParserModuleTests.java
	src/test/java/org/elasticsearch/index/search/FieldDataTermsFilterTests.java
	src/test/java/org/elasticsearch/index/search/child/ChildrenConstantScoreQueryTests.java
	src/test/java/org/elasticsearch/index/similarity/SimilarityTests.java
2014-07-28 11:27:33 +02:00
Martijn van Groningen 5631bbb02b [TEST] All shards should be allocated before snapshotting. 2014-07-28 10:48:35 +02:00
Martijn van Groningen 86c0d693c3 [TEST] Ignore Lucene40 codec 2014-07-28 10:40:25 +02:00
Colin Goodheart-Smithe f7b7f67522 Aggregations: fixed value count so it can be used in terms order
Closes #7050
2014-07-28 09:19:01 +01:00
Martijn van Groningen 2e9ee5c937 The `nested` aggregator should also resolve and use the parentFilter of the closest `reverse_nested` aggregator.
Closes #6994
Closes #7048
2014-07-28 10:07:57 +02:00
mikemccand e42b73c6d4 Test: more verbosity for this test on failure 2014-07-26 04:42:26 -04:00
Adrien Grand f682461b2f Mappings: Enforce non-null settings.
No that we are using the index created version to make index-time decisions,
assuming that the version is the current version when settings are null is
very error-prone. Instead we should ensure that settings are always non-null
and contain the version when the index was created.

Close #7032
2014-07-25 21:01:44 +02:00
David Pilato 11eced01da Add multi_field support for Mapper externalValue (plugins)
In context of mapper attachment and other mapper plugins, when dealing with multi fields, sub fields never get the `externalValue` although it was set.

Here is a full script which reproduce the issue when used with mapper attachment plugin:

```
DELETE /test

PUT /test
{
    "mappings": {
        "test": {
            "properties": {
                "f": {
                    "type": "attachment",
                    "fields": {
                        "f": {
                            "analyzer": "english",
                            "fields": {
                                "no_stemming": {
                                    "type": "string",
                                    "store": "yes",
                                    "analyzer": "standard"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

PUT /test/test/1
{
    "f": "VGhlIHF1aWNrIGJyb3duIGZveGVz"
}

GET /test/_search
{
    "query": {
        "match": {
           "f": "quick"
        }
    }
}

GET /test/_search
{
    "query": {
        "match": {
           "f.no_stemming": "quick"
        }
    }
}

GET /test/test/1?fields=f.no_stemming
```

Related to https://github.com/elasticsearch/elasticsearch-mapper-attachments/issues/57

Closes #5402.
2014-07-25 16:59:42 +02:00
Colin Goodheart-Smithe 655157c83a Aggregations: Added an option to show the upper bound of the error for the terms aggregation.
This is only applicable when the order is set to _count.  The upper bound of the error in the doc count is calculated by summing the doc count of the last term on each shard which did not return the term.  The implementation calculates the error by summing the doc count for the last term on each shard for which the term IS returned and then subtracts this value from the sum of the doc counts for the last term from ALL shards.

Closes #6696
2014-07-25 14:24:24 +01:00
Alexander Reelsen a1e335b1e9 CORS: Support regular expressions for origin to match against
This commit adds regular expression support for the allow-origin
header depending on the value of the request `Origin` header.

The existing HttpRequestBuilder is also extended to support the
OPTIONS HTTP method.

Relates #5601
Closes #6891
2014-07-25 10:51:22 +02:00
Alexander Reelsen 35e562343f Tests: Remove HttpClient to only use one Http client
The HTTP client implementation used by the Elasticsearch REST tests is
backed by apache http client instead of a self written helper class,
that uses HttpUrlConnection. This commit removes the old simple HttpClient
class and uses the more powerful and reliable one for all tests.

It also fixes a minor bug, that when sending a 301 redirect, a Location
header needs to be added as well, which was uncovered by the switching
to the new client.

Closes #7003
2014-07-25 10:26:52 +02:00
Adrien Grand 51fd2f513c [TESTS] Fix NPE in FreqTermsEnumTests. 2014-07-25 09:12:01 +02:00
Martijn van Groningen a0e5684d7b [TEST] more logging 2014-07-25 01:16:32 +02:00
Adrien Grand a3d8022dc5 Fielddata: Fix thread safety issue with field data on the `_index` field. 2014-07-24 19:04:22 +02:00
Lee Hinman 89e03910f4 Add a periodic cleanup thread for IndexFieldCache caches
Fixes #7010
2014-07-24 17:23:52 +02:00
Martijn van Groningen 297a97cd23 Core: Use the provided cluster state instead of fetching a new cluster state from cluster service.
Close #7013
2014-07-24 16:23:42 +02:00
Colin Goodheart-Smithe 5483c62de6 Geo: Fixes parse error with complex shapes
The bug reproduces when the point under test for the placement of the hole of the polygon has an x coordinate which only intersects with the ends of edges in the main polygon. The previous code threw out these cases as not relevant but an intersect at 1.0 of the distance from the start to the end of an edge is just as valid as an intersect at any other point along the edge.  The fix corrects this and adds a test.

Closes #5773
2014-07-24 15:17:55 +01:00
Simon Willnauer bd51d7a07f Add `wait_if_ongoing` option to _flush requests
This commit adds the ability to force blocking on the flush operaition
to make sure all files have been written and synced to disk. Without
this option a flush might be executing at the same time causing the
current flush to fail and return before all files being synced.

Closes #6996
2014-07-24 15:34:53 +02:00
Colin Goodheart-Smithe 127649d174 Aggregations: Added pre and post offset to histogram aggregation
Added preOffset and postOffset parameters to the API for the histogram aggregation which work in the same way as in the date histogram

Closes #6605
2014-07-24 14:32:33 +01:00
Adrien Grand f5d1e0a37d [TESTS] Ensure yellow in SimpleFacetsTests.testFilterFacetWithFacetFilterPostMode. 2014-07-24 15:21:20 +02:00
Shay Banon eb37a5992b remove use of recycled set in filters eviction
closes #7012
2014-07-24 15:00:30 +02:00
javanna d9ff42f88a Internal: expose the indices names every action relates to if applicable
Added two new interfaces:
1) IndicesRequest that allows to retrieve the indices the request relates to in a generic manner, together with the indices options that tell how they are going to get resolved and expanded
2) CompositeIndicesRequest for compound requests that hold multiple indices request like MultiSearchRequest, MultiGetRequest, MultiTermVectorsRequest, BulkRequest, BenchmarkRequest, PercolateRequest, MultiPercolateRequest and MoreLikeThisRequest

Taken the chance to streamline the indices options and add them to every request where it makes sense (although they can't be changed from the outside), rather than leaving them implicit in the related TransportAction when indices get expanded (tipycally MetaData#concreteIndices or MetaData#concreteSingleIndex). Added IndicesOptions parameter to MetaData#concreteSingleIndex to make sure it is taken from the request, where the information belongs, instead of hardcoded within MetaData. The concreteSingleIndex method remains but it's just a utility method that returns a single index instead of an array and complains otherwise.

Also made sure NPE is never thrown when setting indices(null) to IndicesAliasesRequest, similar to what SearchRequest does.

Closes #6933
2014-07-24 14:42:40 +02:00
Adrien Grand 6f31b1135a [Benchmark] Make TermsAggregationSearchBenchmark fairer to uninverted field data.
The benchmark indexes 200 unique full-width longs. For uninverted field data
we try to use the most memory-efficient storage, and in that case it would use
two arrays: one for the doc->ordinals mapping and one for the ordinal->value
mapping. Which is slower than what doc values do by storing directly the
mapping from docs to values.
2014-07-24 14:35:47 +02:00
Colin Goodheart-Smithe fdf2bb9371 Aggregations: Better JSON output scoping
Before this change each aggregation had to output an object field with its name and write its JSON inside that object.  This allowed for badly behaved aggregations which could write JSON content in the root of the 'aggs' object.  this change move the writing of the aggregation name to a level above the aggregation itself, ensuring that aggregations can only write within there own scope in the JSON output.

Closes #7004
2014-07-24 12:02:40 +01:00
Robert Muir d8cd755445 Speed up string sort with custom missing value
Today if the user supplies a custom missing value for a string sort,
we do it in an extremely slow way, not using ordinals but dereferencing
bytes for every document. Ordinals are only used if the missing value
is _first or _last.

Instead, use ordinals with custom missing values too.

Closes #7005
2014-07-24 06:27:59 -04:00
Simon Willnauer f130d60b72 [TEST] Don't randomize preference PRIMARY it might not try replicas depending on the clusterstate 2014-07-24 11:36:31 +02:00
Martijn van Groningen 73f7f426de Made `_source` parsing in `top_hits` aggregation consistent with regular `_source` parsing in search api.
Closes #6997
2014-07-24 11:23:59 +02:00
Adrien Grand 8cb4471cca [TESTS] Add more assertions to SimpleFacetsTests. 2014-07-24 11:13:53 +02:00
Brian Murphy ce864d4016 [REFACTOR] TransportActions
Get rid of boilerplate code for handling transport actions.
Make these transport actions extend HandledTransportAction where this code
now lives.
2014-07-24 11:05:29 +01:00
javanna 3e30fa2089 Internal: streamline use of IndexClosedException when executing operation on closed indices
Single index operations to use the newly added IndexClosedException introduced with #6475. This way we can also fail faster when we are trying to execute operations on closed indices and their use is not allowed (depending on indices options). Indices blocks are still checked but we can already throw error while resolving indices (MetaData#concreteIndices).

Effectively this change also affects what we return when using one of the following apis: analyze, bulk, index, update, delete, explain, get, multi_get, mlt, term vector, multi_term vector. We now return `{"error":"IndexClosedException[[test] closed]","status":403}` instead of `{"error":"ClusterBlockException[blocked by: [FORBIDDEN/4/index closed];]","status":403}`.

Closes #6988
2014-07-24 10:33:58 +02:00
Colin Goodheart-Smithe dc9e9cb4cc Aggregations: change to default shard_size in terms aggregation
The default shard size in the terms aggregation now uses BucketUtils.suggestShardSideQueueSize() to set the shard size if the user does not specify it as a parameter.

Closes #6857
2014-07-24 07:55:09 +01:00
Areek Zillur 5487c56c70 Search & Count: Add option to early terminate doc collection
Allow users to control document collection termination, if a specified terminate_after number is
set. Upon setting the newly added parameter, the response will include a boolean terminated_early
flag, indicating if the document collection for any shard terminated early.

closes #6876
2014-07-23 15:10:15 -04:00
Robert Muir 66825ac851 Change numeric data types to use SORTED_NUMERIC docvalues type
instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized
for the common case where fields actually only contain at most one
value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more
than once, so mathematical computations such as averages are correct.

Closes #6967
2014-07-23 14:55:03 -04:00
Adrien Grand ff2903d2c6 [TEST] Don't recycle in facets.
The recycling happening in facets is done manually and arrays are sometimes not
released. Aggregations do it in a less error-prone way by registering on to the
SearchContext.
2014-07-23 20:20:16 +02:00
Adrien Grand 629f91ae57 Fielddata: goodbye comparators.
This commit removes custom comparators in favor of the ones that are in Lucene.

The major change is for nested documents: instead of having a comparator wrapper
that deals with nested documents, this is done at the fielddata level by having
a selector that returns the value to use for comparison.

Sorting with custom missing string values might be slower since it is using
TermValComparator since Lucene's TermOrdValComparator only supports sorting
missing values first or last. But other than this particular case, this change
will allow us to benefit from improvements on comparators from the Lucene side.

Close #5980
2014-07-23 20:08:36 +02:00