Commit Graph

9035 Commits

Author SHA1 Message Date
javanna d9ff42f88a Internal: expose the indices names every action relates to if applicable
Added two new interfaces:
1) IndicesRequest that allows to retrieve the indices the request relates to in a generic manner, together with the indices options that tell how they are going to get resolved and expanded
2) CompositeIndicesRequest for compound requests that hold multiple indices request like MultiSearchRequest, MultiGetRequest, MultiTermVectorsRequest, BulkRequest, BenchmarkRequest, PercolateRequest, MultiPercolateRequest and MoreLikeThisRequest

Taken the chance to streamline the indices options and add them to every request where it makes sense (although they can't be changed from the outside), rather than leaving them implicit in the related TransportAction when indices get expanded (tipycally MetaData#concreteIndices or MetaData#concreteSingleIndex). Added IndicesOptions parameter to MetaData#concreteSingleIndex to make sure it is taken from the request, where the information belongs, instead of hardcoded within MetaData. The concreteSingleIndex method remains but it's just a utility method that returns a single index instead of an array and complains otherwise.

Also made sure NPE is never thrown when setting indices(null) to IndicesAliasesRequest, similar to what SearchRequest does.

Closes #6933
2014-07-24 14:42:40 +02:00
Adrien Grand 6f31b1135a [Benchmark] Make TermsAggregationSearchBenchmark fairer to uninverted field data.
The benchmark indexes 200 unique full-width longs. For uninverted field data
we try to use the most memory-efficient storage, and in that case it would use
two arrays: one for the doc->ordinals mapping and one for the ordinal->value
mapping. Which is slower than what doc values do by storing directly the
mapping from docs to values.
2014-07-24 14:35:47 +02:00
Colin Goodheart-Smithe fdf2bb9371 Aggregations: Better JSON output scoping
Before this change each aggregation had to output an object field with its name and write its JSON inside that object.  This allowed for badly behaved aggregations which could write JSON content in the root of the 'aggs' object.  this change move the writing of the aggregation name to a level above the aggregation itself, ensuring that aggregations can only write within there own scope in the JSON output.

Closes #7004
2014-07-24 12:02:40 +01:00
Robert Muir d8cd755445 Speed up string sort with custom missing value
Today if the user supplies a custom missing value for a string sort,
we do it in an extremely slow way, not using ordinals but dereferencing
bytes for every document. Ordinals are only used if the missing value
is _first or _last.

Instead, use ordinals with custom missing values too.

Closes #7005
2014-07-24 06:27:59 -04:00
Simon Willnauer f130d60b72 [TEST] Don't randomize preference PRIMARY it might not try replicas depending on the clusterstate 2014-07-24 11:36:31 +02:00
Martijn van Groningen 73f7f426de Made `_source` parsing in `top_hits` aggregation consistent with regular `_source` parsing in search api.
Closes #6997
2014-07-24 11:23:59 +02:00
Adrien Grand 8cb4471cca [TESTS] Add more assertions to SimpleFacetsTests. 2014-07-24 11:13:53 +02:00
Brian Murphy ce864d4016 [REFACTOR] TransportActions
Get rid of boilerplate code for handling transport actions.
Make these transport actions extend HandledTransportAction where this code
now lives.
2014-07-24 11:05:29 +01:00
javanna 3e30fa2089 Internal: streamline use of IndexClosedException when executing operation on closed indices
Single index operations to use the newly added IndexClosedException introduced with #6475. This way we can also fail faster when we are trying to execute operations on closed indices and their use is not allowed (depending on indices options). Indices blocks are still checked but we can already throw error while resolving indices (MetaData#concreteIndices).

Effectively this change also affects what we return when using one of the following apis: analyze, bulk, index, update, delete, explain, get, multi_get, mlt, term vector, multi_term vector. We now return `{"error":"IndexClosedException[[test] closed]","status":403}` instead of `{"error":"ClusterBlockException[blocked by: [FORBIDDEN/4/index closed];]","status":403}`.

Closes #6988
2014-07-24 10:33:58 +02:00
Colin Goodheart-Smithe dc9e9cb4cc Aggregations: change to default shard_size in terms aggregation
The default shard size in the terms aggregation now uses BucketUtils.suggestShardSideQueueSize() to set the shard size if the user does not specify it as a parameter.

Closes #6857
2014-07-24 07:55:09 +01:00
Areek Zillur 5487c56c70 Search & Count: Add option to early terminate doc collection
Allow users to control document collection termination, if a specified terminate_after number is
set. Upon setting the newly added parameter, the response will include a boolean terminated_early
flag, indicating if the document collection for any shard terminated early.

closes #6876
2014-07-23 15:10:15 -04:00
Robert Muir 66825ac851 Change numeric data types to use SORTED_NUMERIC docvalues type
instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized
for the common case where fields actually only contain at most one
value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more
than once, so mathematical computations such as averages are correct.

Closes #6967
2014-07-23 14:55:03 -04:00
Adrien Grand ff2903d2c6 [TEST] Don't recycle in facets.
The recycling happening in facets is done manually and arrays are sometimes not
released. Aggregations do it in a less error-prone way by registering on to the
SearchContext.
2014-07-23 20:20:16 +02:00
Adrien Grand 629f91ae57 Fielddata: goodbye comparators.
This commit removes custom comparators in favor of the ones that are in Lucene.

The major change is for nested documents: instead of having a comparator wrapper
that deals with nested documents, this is done at the fielddata level by having
a selector that returns the value to use for comparison.

Sorting with custom missing string values might be slower since it is using
TermValComparator since Lucene's TermOrdValComparator only supports sorting
missing values first or last. But other than this particular case, this change
will allow us to benefit from improvements on comparators from the Lucene side.

Close #5980
2014-07-23 20:08:36 +02:00
Lee Hinman a1a03a184c [DOCS] Fix nested root object indexing documentation
Types can no longer be specified when indexing, see:
https://github.com/elasticsearch/elasticsearch/pull/4552
2014-07-23 18:34:27 +02:00
Adrien Grand 76511158b5 Fielddata: Fix the ordinals impl for sparse fields.
Caused by #6908
2014-07-23 17:39:43 +02:00
Britta Weber 10201d511c [doc] Correct decay function equations in function_score description
Impact of decay and scale was missing from the equations.

Closes #6983
2014-07-23 17:33:22 +02:00
Clinton Gormley 0f943850a0 Update named-queries-and-filters.asciidoc 2014-07-23 17:28:49 +02:00
Simon Willnauer 5bfea56457 [DOCS] move all coming tags to added in master 2014-07-23 16:37:19 +02:00
babeya 81a83aab22 Docs: Update query-string-syntax.asciidoc
Closes #6253
2014-07-23 16:32:32 +02:00
Simon Willnauer b51bd3a645 Add version 1.2.4 and 1.3.1 to the version table 2014-07-23 16:26:48 +02:00
Lee Hinman 6e25a6a7aa [DOCS] clarify /_cat/fielddata REST api documentation 2014-07-23 16:18:37 +02:00
Konrad Feldmeier 48812ff1f2 Reflect that 'field_value_factor' is only in 1.2.x
While the blogpost http://www.elasticsearch.org/blog/2014-04-02-this-week-in-elasticsearch/ states, that feature #5519 was
added to 1.x, the release notes for, e.g. v1.1.2, however tell otherwise.
Only the release notes for 1.2.0 list #5519 as a new feature.

Since the 1.x docs deprecate/discourage from using `_boost`, and seemingly give a migration example at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#function-score-instead-of-boost
users of 1.1.x should be warned.
2014-07-23 15:49:03 +02:00
Simon Willnauer 0a1701d416 [BUILD] skip bwc version check if directory doesn't exists or is not a directory 2014-07-23 14:54:28 +02:00
Simon Willnauer be96f57c11 [TEST] Fix SimpleThreadPoolTests to exclude test infra threads 2014-07-23 14:44:08 +02:00
Peter Johnson @insertcoffee 9a4abc2620 Docs: typo
example fails in bash

Closes #6977
2014-07-23 12:43:43 +02:00
mikemccand cc4d7c6272 Core: don't load bloom filters by default
This change just changes the default for index.codec.bloom.load to
false: with recent performance improvements to ID lookup, such as
#6298, bloom filters don't give much of a performance gain anymore,
and they can consume non-trivial RAM when there are many tiny
documents.

For now, we still index the bloom filters, so if a given app wants
them back, it can just update the index.codec.bloom.load to true.

Closes #6959
2014-07-23 05:58:41 -04:00
Clinton Gormley 3f9aea883f Docs: Made current version, branch and jdk into asciidoc attributes 2014-07-23 11:55:35 +02:00
Lee Hinman 15ccd787a5 [TEST] Maven reproductions should always include 'clean' target 2014-07-23 11:47:58 +02:00
mikemccand 5ccd44519a Test: make test less evil 2014-07-23 05:35:52 -04:00
mikemccand 55986907e8 Test: add more verbosity when this test fails 2014-07-23 05:23:15 -04:00
Clinton Gormley 17df714229 Docs: Change public signing key instructions to work with sudo
Closes #6823
2014-07-23 11:13:03 +02:00
Clinton Gormley 254aa71693 Docs: Added Tiki Wiki integration
Closes #6746
2014-07-23 11:00:46 +02:00
Clinton Gormley ecb2e181ae Docs: Added Tiki Wiki integration
Closes #6746
2014-07-23 10:57:09 +02:00
Lee Hinman c38a9d73e7 [TEST] Add test for _score and doc[] access in Groovy scripts 2014-07-23 09:58:38 +02:00
Adrien Grand bf4bdcce73 Build: Remove UnsafeUtils from forbidden-apis exclusion list. 2014-07-23 09:30:51 +02:00
Britta Weber 734e656a91 Make _all field accessible with GET
GET only returned null even when stored if requested with GET like this:

`curl -XGET "http://localhost:9200/test/test/1?fields=_all"`

Instead, it should simply behave like a String field and return the
concatenated fields as String.

closes #6924
2014-07-23 09:16:35 +02:00
Adrien Grand 08f8731b6f Core: Drop UnsafeUtils.
This class potentially does unaligned memory access and does not bring much
now that we switched to global ords for terms aggregations.

Close #6962
2014-07-23 08:41:11 +02:00
Areek Zillur f39d4e1f89 PhraseSuggester: Collate option should allow returning phrases with no matching docs
A new option `prune` has been added to allow users to control phrase suggestion pruning when `collate`
is set. If the new option is set, the phrase suggestion option will contain a boolean `collate_match`
indicating whether the respective result had hits in collation.

CLoses #6927
2014-07-22 17:17:15 -04:00
Simon Willnauer 0faffcf372 [TEST] Add simple sort assertions for bwc tests
Today we only do count searches to ensure sane results are returned
after upgrading etc. This change adds sorting to the picture asserting
on simple numeric sorting that uses field data etc. after upgrading.

Relates to #6967
2014-07-22 22:22:09 +02:00
Shay Banon 50ececbbcf Unicast discovery: only disconnect from temporary connected nodes
In unicast discovery, we try to reuse existing discovery nodes based on the node address they have. If we find an existing node based on its address, and for some reason its not connected, don't add it to the list of nodes to disconnect from, as that (full) connection is useful down the road
closes #6966
2014-07-22 21:29:57 +02:00
Shay Banon 88f3afe4b5 Fix connect concurrency, can cause connection nodes to close
Looking at the connect code, if 2 threads at the same time try and connect to a node, and both enter sequentially the connectLock code block, the second one would try and put the connection in the map, and close the replaced channels, which will cause the existing connection to close as well (since it removes the node from the connectedNodes map)
To fix this, simply make sure we properly check the existence of the connection within the connectionLock block, so there won't be concurrent connections going on.
While doing this, also went over all the mutation code that handles disconnections, and made sure they are properly done only within a connection lock.
closes #6964
2014-07-22 19:48:47 +02:00
mikemccand 72b3d6ef75 Test: make sure randomizer doesn't swap in SerialMergeScheduler on us 2014-07-22 13:06:04 -04:00
mikemccand 1e92f0f4ff Core: allow index.merge.scheduler.max_thread_count to be updated dynamically
Lucene allows the max_thread_count to be updated, but this wasn't
fully exposed in Elasticsearch.

Closes #6925
2014-07-22 11:23:46 -04:00
Clinton Gormley f14af3599a Fixed typo in AbstractFieldMapper
similariry -> similarity
2014-07-22 15:54:09 +02:00
Brian Murphy b98f19a54b [DOCS] Fix typo 2014-07-22 14:51:31 +01:00
Brian Murphy 3c5de7d4a1 [DOCS] Fix indentation 2014-07-22 14:49:45 +01:00
Brian Murphy e3b1aed0fc [DOCS] Update examples to groovy. 2014-07-22 14:45:46 +01:00
Brian Murphy 7d9b012ca1 [FIX] Fix update parser to accept script_id 2014-07-22 14:22:57 +01:00
Adrien Grand 3c142e550d Fielddata: Switch to Lucene DV APIs.
This commits removes BytesValues/LongValues/DoubleValues/... and tries to use
Lucene's APIs such as NumericDocValues or RandomAccessOrds instead whenever
possible.

The next step would be to take advantage of the fact that APIs are the same in
Lucene and Elasticsearch in order to remove our custom comparators and use
Lucene's.

There are a few side-effects to this change:
 - GeoDistanceComparator has been removed, DoubleValuesComparator is used instead
   on top of dynamically computed values (was easier than migrating
   GeoDistanceComparator).
 - SortedNumericDocValues doesn't guarantee uniqueness so long/double terms
   aggregators have been updated to make sure a document cannot fall twice in
   the same bucket.
 - Sorting by maximum value of a field or running a `max` aggregation is
   potentially significantly faster thanks to the random-access API.

Our aggs and p/c aggregations benchmarks don't report differences with this
change on uninverted field data. However the fact that doc values don't need
to be wrapped anymore seems to help a lot. For example
TermsAggregationSearchBenchmark reports ~30% faster terms aggregations on doc
values on string fields with this change, which are now only ~18% slower than
uninverted field data although stored on disk.

Close #6908
2014-07-22 15:16:24 +02:00