Commit Graph

4920 Commits

Author SHA1 Message Date
Simon Willnauer a89230945f Add NGramTokenizer and NGramTokenFilter to broken chains
NgramTokenizer and NGramTokenFilter are broken with a version < 4.2
We should still support these filters but should prevent the StringIOOB
exceptions. Adding these fitlers for the FragmentBuilderHelper will
allow seamless highlighting on fields indexed with those tokenizers or
tokenfilters
2013-05-08 17:57:20 +02:00
chilling b7cd8a64cd Merge pull request #3009 from chilling/issue2986_scores
Fixed parsing of track_scores in RestSearchAction
2013-05-08 03:54:16 -07:00
Florian Schilling 19fab7cd0e Fixed parsing of `track_scores` in `RestSearchAction`
Closes #2986
2013-05-08 12:45:32 +02:00
Simon Willnauer c1e8d4787a Don't use smart query wrapping for span term query
Lucenes span queries are a different family than 'ordinary' queries
in lucene. Spans only work with other spans such that smart query
wrapping doesn't work with span queries at all ie. we can't wrap
in filtered query.

Closes #2994
2013-05-07 22:52:57 +02:00
Simon Willnauer 992a40cbd8 Add `field_masking_span` to IndexQueryModule
The query parser for `field_masking_span` has never been added / bound to
the IndexQueryModule.

Closes #3007
2013-05-07 21:50:47 +02:00
Igor Motov 32abf6b890 Fix error getting array fields
Fixes #3000
2013-05-07 13:30:29 -04:00
Simon Willnauer 130f0f6afd Remove Java 7 only API
We still run on Java 6 as minimum requirement. Integer.compare(int,int)
was added in Java 7. This caused compile errors on CI.
2013-05-07 18:40:56 +02:00
uboness 74317fec8b Fixed custom hunspell dictionary directory
Properly loading dictionaries from based on the "indices.analysis.hunspell.dictionary.location" setting if one exists
2013-05-07 17:21:37 +02:00
Simon Willnauer e1b66b34ea Don't fail hard if broken analysis is used.
Today an analysis chain with broken tokenfilters or tokenizers like
WordDelimiterFilter might produce somewhat broken term vectors that cause
`StringIndexOutOfBoundsExceptions` if FastVectorHighlighter is used
since the positions / offsets contract is violated and offsets of highlight
tokens are not increasing but decreasing even if their positions are increasing.

Yet, if we detect such a situation we can resort the tokens which might cause
somewhat odd highlights but doesn't fail hard with a StringIndexOOBException.

Closes #3006
2013-05-07 16:29:00 +02:00
uboness 14ae2fb765 Changed the priority of delete-index action to URGENT
All index meta data API's have urgent priority when it comes to cluster state updates. We'd like to remove indices asap to avoid things like unnecessary shards relocations
2013-05-07 05:43:08 +02:00
Simon Willnauer 758a4fcdc0 Enable Geo-Shape Relations Within and Disjoint 2013-05-06 18:03:16 +02:00
Simon Willnauer 2219925485 Upgrade to Lucene 4.3.0
This Lucene Release introduced a new API on DocIdSetIterator that requires each
implementation to return a `cost` upperbound as a function of the iterated documents.
This API allows for several optimizations during query execution especially in
Conjunction and Disjunction Queries with min_should_match set.

Closes #2990
2013-05-06 18:03:16 +02:00
Shay Banon f566527513 Rest Get Source
Allow to get the source directly using a specific REST endpoint without any additional content around it, the endpoint is `{index}/{type}/{id}/_source`.
Note, HEAD now also support the _source endpoint.
closes #2993, closes #2995
2013-05-06 14:33:23 +02:00
Derek McNeil fbd732cde2 Added support for Collections in TermsQuery/InQuery. 2013-05-06 10:30:30 +02:00
Simon Willnauer 3c995d5dcc Expose Lucene Main Version via Main Action. A call to `/` will
return the version of the used Lucene library next to the Elasticsearch
version.

Closes #2988
2013-05-06 09:48:08 +02:00
Simon Willnauer 29da615afd Use full ord range in binary search. The upperbound of the binary search in
BytesRefOrdValComparator starts at 1 and ends at maxOrd - 1. Yet, numOrd is defined
as maxOrd - 1 excluding the 0 ord.

This causes wrong sort ords when the bottom of the queue is compared to the next
segment and the greatest term in the new segment is in-fact less than the current
queue bottom. If that is true we treat the values as equal and never include the right
value into the queue.

Closes #2991
2013-05-05 00:48:10 +02:00
Igor Motov f92c53efdb Accept loopback interfaces in the network.host setting
Closes #2924. Adds support for loopback interfaces such as _lo0_ in network.host and other network settings.
2013-05-03 14:36:49 -04:00
Martijn van Groningen f22510cab5 A neater approach of for processing should clauses before must or must_not clauses. 2013-05-03 18:25:32 +02:00
Martijn van Groningen 52edc4c652 Fixed issue where 'fast' should filter can make documents that didn't match the must or must_not clause a match again. Relates to #2979 2013-05-03 17:37:41 +02:00
Alexander Reelsen 70355f693f Refactoring SpanMultiTermQuery support
* Added license headers where needed
* Refactored SpanMultiTermQueryParser
* Refactored tests to adhere to other tests
2013-05-03 16:00:51 +02:00
Anton Hägerstrand e30aa6b221 Support SpanMultiTerm, closes #2610, #2400
This adds support for lucene span multi term queries. This lucene query
allows users to form complicated queries such as wildcards or prefix
queries embedded within span queries.
2013-05-03 16:00:31 +02:00
Igor Motov ed289dc6c7 Improve stability of SimpleDataNodesTests
Make sure that we are waiting for the new state to be propagated to the node where we are executing the followup query that depends on this state.
2013-05-03 09:14:09 -04:00
Simon Willnauer c9c10273a6 Introduced a Opertaion enum that is passed to each call of
WeightFunction#weight to allow dedicated weight calculations per operation. In certain
circumstance it is more efficient / required to ignore certain factors in the weight
calculation to prevent for instance relocations if they are solely triggered by tie-breakers.
In particular the primary balance property should not be taken into account if the delta for
early termination is calculated since otherwise a relocation could be triggered solely by the
fact that two nodes have different amount of primaries allocated to them.

Closes #2984
2013-05-03 14:37:47 +02:00
Alexander Reelsen ad92d82680 Added a first small set of hamcrest matchers
A first implementation of adding matchers and helper methods to elasticsearch.
The following ones are supported

assertHitCount(searchResponse, 2);

// helper methods to easily access the first hits
assertFirstHit(searchResponse, hasId("foo")):
assertSecondHit(searchResponse, hasType("foo")):
assertThirdHit(searchResponse, hasIndex("foo")):

// methods to access all other hits
assertSearchHit(searchResponse, 5, hasId("10"));
// same as above, but maybe more readable
assertSearchHit(searchResponse.getHits().getAt(5), hasIndex("foo"));

I changed GeoFilterTests to show how it works.

Furthermore I inlined assertHighlight() from HighlighterSearchTests.
The ElasticsearchAssertions class can be used now as a centralized assertion class
in order have a centralized class for every developer to look at.
2013-05-03 09:29:56 +02:00
Simon Willnauer 345b63e2d0 Use less agressive threshold to prevent primary relocation in recovery test 2013-05-02 17:51:21 +02:00
Simon Willnauer 72982d955a Use current settings as default in BalancedShardsAllocator instead of defaults.
Custom settings are not always present in the `Settings` that are passed
to `NodeSettingsService.Listener#onRefreshSettings` such that using the defaults
will necessarily override the custom settings if set before.

Closes #2973
2013-05-02 16:23:48 +02:00
uboness 58bc21a216 Added tests for hunspell token filter factory 2013-05-02 14:10:30 +02:00
Martijn van Groningen 59a741cee5 Properly cache parent/child queries in the case they are wrapped in a compound filter.
Closes #2971
2013-05-02 12:08:54 +02:00
uboness f430953ca1 Changed hunspell token filter factory to use "dedup = true" by default 2013-05-01 23:57:14 +02:00
Martijn van Groningen 0d3b7871df Added support for sort_mode `avg` for sorting by geo_distance.
Closes #2962
2013-05-01 12:53:31 +02:00
Martijn van Groningen c21ab1a9cf Return proper response code for delete by query api in the case of failures.
Closes #2963
2013-05-01 11:53:40 +02:00
Igor Motov 6437c51501 Improve stability of SimpleRecoveryLocalGatewayTests
Fixed testX and testSingleNodeNoFlush by specifying mapping on index creation instead of using dynamic mapping. Dynamic mapping is updated on the cluster level asynchronously and if mapping changes are not applied to the cluster state before node is closed, these changes are not be available after node restart. While data added in the test is preserved, due to absence of mapping, the test still fails. This is a known issue that we are not planning to fix at the moment.
2013-04-30 12:11:30 -04:00
Alexander Reelsen a694e97ab9 Support source include/exclude for realtime GET
Currently realtime GET does not take source includes/excludes into account.
This patch adds support for the source field mapper includes/excludes
when getting an entry from the transaction log. Even though it introduces
a slight performance penalty, it now adheres to the defined configuration
instead of returning all source data when a realtime get is done.
2013-04-30 17:48:03 +02:00
Alexander Reelsen d5f4c8230d XContentMapValues.filter now works with nested arrays
The filter method of XContentMapValues actually filtered out nested
arrays/lists completely due to a bug in the filter method, which threw
away all data inside of such an array.

Closes #2944
This bug was a follow up problem, because of the filtering of nested arrays
in case source exclusion was configured.
2013-04-30 17:33:09 +02:00
Simon Willnauer 773ea0306b Fail will IAE if a numeric field is used for the anaysis endpoint.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.

Closes #2953 #2952
2013-04-30 16:07:11 +02:00
Simon Willnauer 8c6ba59b83 Upgrade Lucene Version to 4.2. The latest Elasticsearch version must
use the latest Lucene version as specified in o.e.common.lucene.Lucene
and must be upgraded with each lucene release.

This commit adds an assert that fails once the actual lucene version
that is used is higher than the current releases version.
2013-04-30 14:06:57 +02:00
Simon Willnauer 42b9674d0c added simple test for numeric match query 2013-04-30 13:53:49 +02:00
Shay Banon 6c3bb4dcdd move to 1.0.0.Beta1 snap 2013-04-29 13:51:09 +02:00
Shay Banon cb75ce0caa release 0.90.0 GA 2013-04-29 13:41:43 +02:00
Shay Banon 9ded2405a0 Use Lucene Version that was used to create the index in Analysis
Lucene ships with a version constant that is mainly used to provide consistent behaviour across lucene release versions. Lucene's Analysis capabilities are commonly applied at index and search time such that the search-time behaviour should be identical to the index-time behaviour in most of the cases. Currently ElasticSearch always uses the latest version from Lucene which can break backwards compatibility with the index for users that rely on behaviour that changed in new Lucene version.

Users should always use the version the index was created with unless it's explicitly configured.

closes #2945
2013-04-29 13:18:51 +02:00
Simon Willnauer bd7ff6946e Added X Versions of NGramTokenFilter and NGramTokenizer to ElasticSearch. These versions
don't produce broken positions anymore and prevent certain highlighter bugs that fail with
StringArrayOutOfBoundsExceptions as in #2931

This commit breaks backwards compatibility in terms of highlighting when NGramTokenFilter is used.
The highlighter will highlight the entire terms as produced by the tokenizer instead of the individual
sub-gram. To do sub-gram highlighting, the ngram tokenizer should be used. This behavior was based on
broken NGramTokenFilter behavior which will be fixed in Lucene 4.4 but was ported in this commit
to elasticsearch 0.90. The broken behavior can still be used if a version < LUCENE_42 is used
in the token filter mapping.

Closes #2931
2013-04-27 16:48:25 +02:00
Shay Banon f09ad507a4 open context stats
- rename to open_contexts from open, we might have other open stats in the future related to search (lucene index searchers?)
- add a test to verify it works
2013-04-27 15:09:47 +02:00
Simon Willnauer 8a7f81104f Remove XSimpleFragmentsBuilder and XScoreOrderFragmentsBuilder since the only difference
to the lucene version is that `discreteMultiValueHighlighting` does default to `true`. Yet
we set this anyway in the HighlightingPhase such that the classes are obsolet.
2013-04-26 20:04:38 +02:00
Simon Willnauer 355f80adc9 Added temporary fix for LUCENE-4899 where FastVectorHighlihgter failed with StringIndexOutOfBoundsException
if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets

The fixed BaseFragListBuilder was added as XSimpleFragListBuilder which triggers an assert once Elasticsearch
upgrades to Lucene 4.3
2013-04-26 19:48:48 +02:00
Simon Willnauer 2ed2fab904 Add assert that fails one Elasticsearch upgrades to Lucene 4.3 in order to remove the duplicated class 2013-04-26 19:16:21 +02:00
Alexander Reelsen 90353ceb79 Fixing possible NoClassDefFoundError when trying to load nonexisting classes
In order to handle exceptions correctly, when classes are not found, one
needs to handle ClassNotFoundException as well as NoClassDefFoundError
in order to be sure to have caught every possible case. We did not cater
for the latter in ImmutableSettings yet.

This fix is just executing the same logic for both exceptions instead of
simply bubbling up NoClassDefFoundError.
2013-04-26 10:34:10 +02:00
Alexander Reelsen 22e25cc165 Added stolen time to OsStats output 2013-04-25 10:46:24 +02:00
Shay Banon c4968d7d65 no longer support snappy... 2013-04-25 09:38:58 +02:00
Igor Motov 982b570037 Fix serialization of sync/async replication type 2013-04-25 08:25:31 +02:00
Martijn van Groningen dd12e0b86c If searchContext not set, abort parsing and throw ISE 2013-04-24 10:24:15 +02:00