OpenSearch

Commit Graph

Author	SHA1	Message	Date
Martijn van Groningen	bacf969dd3	Improved the stability of hl tests by adding waiting for at least yellow status. In some test cases this was missing.	2013-05-13 10:18:36 +02:00
Shay Banon	21d749a6aa	resolved empty setting values should be removed when resolving empty settings values, their value should be removed, for example, when using ${env.ENV_VAR}, and ENV_VAR is not set, then the setting should be removed	2013-05-12 05:18:03 +02:00
Shay Banon	2ab72da7d6	update to joda 2.2	2013-05-11 23:37:56 +02:00
Shay Banon	ee636c2330	use throwable in transport layer catch throwable when processing messages in the transport layer, to report back failures even under errors	2013-05-11 21:30:16 +02:00
Shay Banon	342e9cf18e	test no longer needed...	2013-05-11 01:22:23 +02:00
Shay Banon	6e26efcd87	not active shards should translate to 503 not 500	2013-05-10 18:09:42 +02:00
Alexander Reelsen	21fcc482eb	Allow to set headers in HTTP response This commit allows to set custom headers in HTTP responses (like setting the WWW-Authenticate header for basic auth) by adding RestRequest.addHeader() method. Closes #2936 Closes #2540 To get the history right: This is based on PR #2723	2013-05-10 17:58:46 +02:00
Shay Banon	da5dff9ee4	remove concrete bytes for field data no really need for it, specifically with the fact that we don't need to deepCopy on makeSafe for the (default) paged bytes	2013-05-10 17:42:34 +02:00
Shay Banon	455b5da52f	No need for deepCopy on makeSafe for pages field data Since its a reference to a buffer in the PagedBytes, we don't need to deep copy it on makeSafe, just shallow copy it	2013-05-10 17:25:39 +02:00
Martijn van Groningen	2be23d2427	Added test that checks if a validation error is thrown when both doc and script provided in a update request. Closes #2967	2013-05-10 16:43:20 +02:00
Alexander Kahn	47971ac808	Reject update request that has both script and doc	2013-05-10 16:31:54 +02:00
Martijn van Groningen	9ddd675a02	Added support for the update operation in the bulk api. Update requests can now be put in the bulk api. All update request options are supported. Example usage: ``` curl -XPOST 'localhost:9200/_bulk' --date-binary @bulk.json ``` Contents of bulk.json that contains two update request items: ``` { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} } { "doc" : {"field" : "value"} } { "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} } { "script" : "counter += param1", "lang" : "js", "params" : {"param1" : 1}, "upsert" : {"counter" : 1}} ``` The `doc`, `upsert` and all script related options are part of the payload. The `retry_on_conflict` option is part of the header. Closes #2982	2013-05-10 16:03:24 +02:00
Shay Banon	c5e177dc56	lazy compute the hash and actually use it...	2013-05-10 11:51:55 +02:00
Igor Motov	4d66575abe	Make GetField behavior more consitent for multivalued fields. Before this change, the GetField#getValue() method was returning a list of values of a multivalued fields if the field values were obtained from source or if the field was stored and real-time get was used. If the field was stored but non-realtime get was used, GetField#getValue() was returning only the first element and the GetField#getValues() was returning a list of elements. This change makes behavior consistent. GetField#getValue() now always returns only the first value of the field and GetField#getValues() returns the entire list.	2013-05-09 12:45:49 -04:00
Igor Motov	d69dd321fc	Improve test stability	2013-05-09 10:21:21 -04:00
Shay Banon	8a2e5bbe68	Reroute Allocate to force primary allocation when enabled Typically, the main reason a reroute allocation command with allow_primary is enabled, is to force create an empty new shard because a shard (and its replicas) were lost. This can't be done today because the shard expects to have a valid index where its allocated, we need to clear its post allocation flag to make sure it is allowed to create a fresh index.	2013-05-09 00:47:57 +02:00
Igor Motov	15c8510e65	Fix DfsSearchResult method names in AggregatedDfs	2013-05-08 18:21:47 -04:00
Simon Willnauer	436e23b8d4	Use simplified asserts and better naming addOne / subOne are likely easier to understand without reading the docs. If I would read my emails this would have made it in the last commit.	2013-05-08 22:06:01 +02:00
Simon Willnauer	1ef8761b70	Handle optional term and field statistics gracefully Lucene provides a set of statistics that depend on the codec / postingsformat as well as on the index options used when the field is created / indexed. If a certain stats value is not available lucene return `-1` instead of the correct value. We need to ensure that those values are encoded correctly if we try to write vLongs as well as when we aggregate those values. Closes #3012	2013-05-08 21:34:48 +02:00
Igor Motov	dbaf39c792	Add more informative toString method to StoreDirectory	2013-05-08 12:05:40 -04:00
Simon Willnauer	a89230945f	Add NGramTokenizer and NGramTokenFilter to broken chains NgramTokenizer and NGramTokenFilter are broken with a version < 4.2 We should still support these filters but should prevent the StringIOOB exceptions. Adding these fitlers for the FragmentBuilderHelper will allow seamless highlighting on fields indexed with those tokenizers or tokenfilters	2013-05-08 17:57:20 +02:00
chilling	b7cd8a64cd	Merge pull request #3009 from chilling/issue2986_scores Fixed parsing of track_scores in RestSearchAction	2013-05-08 03:54:16 -07:00
Florian Schilling	19fab7cd0e	Fixed parsing of `track_scores` in `RestSearchAction` Closes #2986	2013-05-08 12:45:32 +02:00
Simon Willnauer	c1e8d4787a	Don't use smart query wrapping for span term query Lucenes span queries are a different family than 'ordinary' queries in lucene. Spans only work with other spans such that smart query wrapping doesn't work with span queries at all ie. we can't wrap in filtered query. Closes #2994	2013-05-07 22:52:57 +02:00
Simon Willnauer	992a40cbd8	Add `field_masking_span` to IndexQueryModule The query parser for `field_masking_span` has never been added / bound to the IndexQueryModule. Closes #3007	2013-05-07 21:50:47 +02:00
Igor Motov	32abf6b890	Fix error getting array fields Fixes #3000	2013-05-07 13:30:29 -04:00
Simon Willnauer	130f0f6afd	Remove Java 7 only API We still run on Java 6 as minimum requirement. Integer.compare(int,int) was added in Java 7. This caused compile errors on CI.	2013-05-07 18:40:56 +02:00
uboness	74317fec8b	Fixed custom hunspell dictionary directory Properly loading dictionaries from based on the "indices.analysis.hunspell.dictionary.location" setting if one exists	2013-05-07 17:21:37 +02:00
Simon Willnauer	e1b66b34ea	Don't fail hard if broken analysis is used. Today an analysis chain with broken tokenfilters or tokenizers like WordDelimiterFilter might produce somewhat broken term vectors that cause `StringIndexOutOfBoundsExceptions` if FastVectorHighlighter is used since the positions / offsets contract is violated and offsets of highlight tokens are not increasing but decreasing even if their positions are increasing. Yet, if we detect such a situation we can resort the tokens which might cause somewhat odd highlights but doesn't fail hard with a StringIndexOOBException. Closes #3006	2013-05-07 16:29:00 +02:00
uboness	14ae2fb765	Changed the priority of delete-index action to URGENT All index meta data API's have urgent priority when it comes to cluster state updates. We'd like to remove indices asap to avoid things like unnecessary shards relocations	2013-05-07 05:43:08 +02:00
Simon Willnauer	758a4fcdc0	Enable Geo-Shape Relations Within and Disjoint	2013-05-06 18:03:16 +02:00
Simon Willnauer	2219925485	Upgrade to Lucene 4.3.0 This Lucene Release introduced a new API on DocIdSetIterator that requires each implementation to return a `cost` upperbound as a function of the iterated documents. This API allows for several optimizations during query execution especially in Conjunction and Disjunction Queries with min_should_match set. Closes #2990	2013-05-06 18:03:16 +02:00
Shay Banon	f566527513	Rest Get Source Allow to get the source directly using a specific REST endpoint without any additional content around it, the endpoint is `{index}/{type}/{id}/_source`. Note, HEAD now also support the _source endpoint. closes #2993, closes #2995	2013-05-06 14:33:23 +02:00
Derek McNeil	fbd732cde2	Added support for Collections in TermsQuery/InQuery.	2013-05-06 10:30:30 +02:00
Simon Willnauer	3c995d5dcc	Expose Lucene Main Version via Main Action. A call to `/` will return the version of the used Lucene library next to the Elasticsearch version. Closes #2988	2013-05-06 09:48:08 +02:00
Simon Willnauer	29da615afd	Use full ord range in binary search. The upperbound of the binary search in BytesRefOrdValComparator starts at 1 and ends at maxOrd - 1. Yet, numOrd is defined as maxOrd - 1 excluding the 0 ord. This causes wrong sort ords when the bottom of the queue is compared to the next segment and the greatest term in the new segment is in-fact less than the current queue bottom. If that is true we treat the values as equal and never include the right value into the queue. Closes #2991	2013-05-05 00:48:10 +02:00
Igor Motov	f92c53efdb	Accept loopback interfaces in the network.host setting Closes #2924. Adds support for loopback interfaces such as _lo0_ in network.host and other network settings.	2013-05-03 14:36:49 -04:00
Martijn van Groningen	f22510cab5	A neater approach of for processing should clauses before must or must_not clauses.	2013-05-03 18:25:32 +02:00
Martijn van Groningen	52edc4c652	Fixed issue where 'fast' should filter can make documents that didn't match the must or must_not clause a match again. Relates to #2979	2013-05-03 17:37:41 +02:00
Alexander Reelsen	70355f693f	Refactoring SpanMultiTermQuery support * Added license headers where needed * Refactored SpanMultiTermQueryParser * Refactored tests to adhere to other tests	2013-05-03 16:00:51 +02:00
Anton Hägerstrand	e30aa6b221	Support SpanMultiTerm, closes #2610 , #2400 This adds support for lucene span multi term queries. This lucene query allows users to form complicated queries such as wildcards or prefix queries embedded within span queries.	2013-05-03 16:00:31 +02:00
Igor Motov	ed289dc6c7	Improve stability of SimpleDataNodesTests Make sure that we are waiting for the new state to be propagated to the node where we are executing the followup query that depends on this state.	2013-05-03 09:14:09 -04:00
Simon Willnauer	c9c10273a6	Introduced a Opertaion enum that is passed to each call of WeightFunction#weight to allow dedicated weight calculations per operation. In certain circumstance it is more efficient / required to ignore certain factors in the weight calculation to prevent for instance relocations if they are solely triggered by tie-breakers. In particular the primary balance property should not be taken into account if the delta for early termination is calculated since otherwise a relocation could be triggered solely by the fact that two nodes have different amount of primaries allocated to them. Closes #2984	2013-05-03 14:37:47 +02:00
Alexander Reelsen	ad92d82680	Added a first small set of hamcrest matchers A first implementation of adding matchers and helper methods to elasticsearch. The following ones are supported assertHitCount(searchResponse, 2); // helper methods to easily access the first hits assertFirstHit(searchResponse, hasId("foo")): assertSecondHit(searchResponse, hasType("foo")): assertThirdHit(searchResponse, hasIndex("foo")): // methods to access all other hits assertSearchHit(searchResponse, 5, hasId("10")); // same as above, but maybe more readable assertSearchHit(searchResponse.getHits().getAt(5), hasIndex("foo")); I changed GeoFilterTests to show how it works. Furthermore I inlined assertHighlight() from HighlighterSearchTests. The ElasticsearchAssertions class can be used now as a centralized assertion class in order have a centralized class for every developer to look at.	2013-05-03 09:29:56 +02:00
Simon Willnauer	345b63e2d0	Use less agressive threshold to prevent primary relocation in recovery test	2013-05-02 17:51:21 +02:00
Simon Willnauer	72982d955a	Use current settings as default in BalancedShardsAllocator instead of defaults. Custom settings are not always present in the `Settings` that are passed to `NodeSettingsService.Listener#onRefreshSettings` such that using the defaults will necessarily override the custom settings if set before. Closes #2973	2013-05-02 16:23:48 +02:00
uboness	58bc21a216	Added tests for hunspell token filter factory	2013-05-02 14:10:30 +02:00
Martijn van Groningen	59a741cee5	Properly cache parent/child queries in the case they are wrapped in a compound filter. Closes #2971	2013-05-02 12:08:54 +02:00
uboness	f430953ca1	Changed hunspell token filter factory to use "dedup = true" by default	2013-05-01 23:57:14 +02:00
Martijn van Groningen	0d3b7871df	Added support for sort_mode `avg` for sorting by geo_distance. Closes #2962	2013-05-01 12:53:31 +02:00
Martijn van Groningen	c21ab1a9cf	Return proper response code for delete by query api in the case of failures. Closes #2963	2013-05-01 11:53:40 +02:00
Igor Motov	6437c51501	Improve stability of SimpleRecoveryLocalGatewayTests Fixed testX and testSingleNodeNoFlush by specifying mapping on index creation instead of using dynamic mapping. Dynamic mapping is updated on the cluster level asynchronously and if mapping changes are not applied to the cluster state before node is closed, these changes are not be available after node restart. While data added in the test is preserved, due to absence of mapping, the test still fails. This is a known issue that we are not planning to fix at the moment.	2013-04-30 12:11:30 -04:00
Alexander Reelsen	a694e97ab9	Support source include/exclude for realtime GET Currently realtime GET does not take source includes/excludes into account. This patch adds support for the source field mapper includes/excludes when getting an entry from the transaction log. Even though it introduces a slight performance penalty, it now adheres to the defined configuration instead of returning all source data when a realtime get is done.	2013-04-30 17:48:03 +02:00
Alexander Reelsen	d5f4c8230d	XContentMapValues.filter now works with nested arrays The filter method of XContentMapValues actually filtered out nested arrays/lists completely due to a bug in the filter method, which threw away all data inside of such an array. Closes #2944 This bug was a follow up problem, because of the filtering of nested arrays in case source exclusion was configured.	2013-04-30 17:33:09 +02:00
Simon Willnauer	773ea0306b	Fail will IAE if a numeric field is used for the anaysis endpoint. Analysing a numeric field will return UTF-16 representations of of Lucenes numeric prefix terms. Those terms are meaningless in general unless used for lookups in the lucene index. Passing a numeric field to the analysis action is most likely a bug. Closes #2953 #2952	2013-04-30 16:07:11 +02:00
Simon Willnauer	8c6ba59b83	Upgrade Lucene Version to 4.2. The latest Elasticsearch version must use the latest Lucene version as specified in o.e.common.lucene.Lucene and must be upgraded with each lucene release. This commit adds an assert that fails once the actual lucene version that is used is higher than the current releases version.	2013-04-30 14:06:57 +02:00
Simon Willnauer	42b9674d0c	added simple test for numeric match query	2013-04-30 13:53:49 +02:00
Shay Banon	6c3bb4dcdd	move to 1.0.0.Beta1 snap	2013-04-29 13:51:09 +02:00
Shay Banon	cb75ce0caa	release 0.90.0 GA	2013-04-29 13:41:43 +02:00
Shay Banon	9ded2405a0	Use Lucene Version that was used to create the index in Analysis Lucene ships with a version constant that is mainly used to provide consistent behaviour across lucene release versions. Lucene's Analysis capabilities are commonly applied at index and search time such that the search-time behaviour should be identical to the index-time behaviour in most of the cases. Currently ElasticSearch always uses the latest version from Lucene which can break backwards compatibility with the index for users that rely on behaviour that changed in new Lucene version. Users should always use the version the index was created with unless it's explicitly configured. closes #2945	2013-04-29 13:18:51 +02:00
Simon Willnauer	bd7ff6946e	Added X Versions of NGramTokenFilter and NGramTokenizer to ElasticSearch. These versions don't produce broken positions anymore and prevent certain highlighter bugs that fail with StringArrayOutOfBoundsExceptions as in #2931 This commit breaks backwards compatibility in terms of highlighting when NGramTokenFilter is used. The highlighter will highlight the entire terms as produced by the tokenizer instead of the individual sub-gram. To do sub-gram highlighting, the ngram tokenizer should be used. This behavior was based on broken NGramTokenFilter behavior which will be fixed in Lucene 4.4 but was ported in this commit to elasticsearch 0.90. The broken behavior can still be used if a version < LUCENE_42 is used in the token filter mapping. Closes #2931	2013-04-27 16:48:25 +02:00
Shay Banon	f09ad507a4	open context stats - rename to open_contexts from open, we might have other open stats in the future related to search (lucene index searchers?) - add a test to verify it works	2013-04-27 15:09:47 +02:00
Simon Willnauer	8a7f81104f	Remove XSimpleFragmentsBuilder and XScoreOrderFragmentsBuilder since the only difference to the lucene version is that `discreteMultiValueHighlighting` does default to `true`. Yet we set this anyway in the HighlightingPhase such that the classes are obsolet.	2013-04-26 20:04:38 +02:00
Simon Willnauer	355f80adc9	Added temporary fix for LUCENE-4899 where FastVectorHighlihgter failed with StringIndexOutOfBoundsException if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets The fixed BaseFragListBuilder was added as XSimpleFragListBuilder which triggers an assert once Elasticsearch upgrades to Lucene 4.3	2013-04-26 19:48:48 +02:00
Simon Willnauer	2ed2fab904	Add assert that fails one Elasticsearch upgrades to Lucene 4.3 in order to remove the duplicated class	2013-04-26 19:16:21 +02:00
Alexander Reelsen	90353ceb79	Fixing possible NoClassDefFoundError when trying to load nonexisting classes In order to handle exceptions correctly, when classes are not found, one needs to handle ClassNotFoundException as well as NoClassDefFoundError in order to be sure to have caught every possible case. We did not cater for the latter in ImmutableSettings yet. This fix is just executing the same logic for both exceptions instead of simply bubbling up NoClassDefFoundError.	2013-04-26 10:34:10 +02:00
Alexander Reelsen	22e25cc165	Added stolen time to OsStats output	2013-04-25 10:46:24 +02:00
Shay Banon	c4968d7d65	no longer support snappy...	2013-04-25 09:38:58 +02:00
Igor Motov	982b570037	Fix serialization of sync/async replication type	2013-04-25 08:25:31 +02:00
Martijn van Groningen	dd12e0b86c	If searchContext not set, abort parsing and throw ISE	2013-04-24 10:24:15 +02:00
Simon Willnauer	c884304753	Fall back to local statistics if global statistics are not availalbe for a field or term Closes #2926	2013-04-23 13:32:35 +02:00
Simon Willnauer	f372f7c109	Cut over StringScriptDataComparator to use BytesRef instead of Strings Closes #2920	2013-04-23 13:29:19 +02:00
Simon Willnauer	7a36bed031	Remove per-doc ord collector callback infavor of an iterator	2013-04-23 10:35:40 +02:00
Martijn van Groningen	c390f9b1a9	Added more test assertions	2013-04-19 22:16:42 +02:00
Simon Willnauer	7ea6cd6888	use Double/Float.compare for stable and correct float sort order	2013-04-19 21:40:01 +02:00
Clinton Gormley	1483a3a0e5	Added tests for multi_match with minimum_should_match	2013-04-19 21:40:01 +02:00
Clinton Gormley	e508b27203	Apply minimum_should_match to inner clauses of multi_match query When specifying minimum_should_match in a multi_match query it was being applied to the outer bool query instead of to each of the inner field-specific bool queries. Closes #2918	2013-04-19 21:39:54 +02:00
Simon Willnauer	3ab56e16b7	Support empty string in FSTBytesAtomicFieldData	2013-04-19 12:49:06 +02:00
Simon Willnauer	a1c62759c9	remove size bound from cache recycler for performance reasons	2013-04-19 12:36:12 +02:00
Simon Willnauer	2d13aa29f8	s/ES.RECYCLE/es.cache.recycle	2013-04-19 11:48:28 +02:00
Simon Willnauer	05b6c46bec	allow CacheRecycler to be cleared via the REST API	2013-04-19 11:45:33 +02:00
Simon Willnauer	79db1bfbf0	make object caching optional	2013-04-18 19:14:19 +02:00
Florian Schilling	54cb4b9615	# Response for Cluster Settings Update API If cluster settings are update the REST API returns the accepted values. For example, updating the `cluster.routing.allocation.disable_allocation` via cluster settings: ```curl -XPUT http://localhost:9200/_cluster/settings -d '{ "transient":{ "cluster.routing.allocation.disable_allocation":"true" } }'``` will respond: ```{ "persistent":{}, "transient":{ "cluster.routing.allocation.disable_allocation":"true" } }``` Closes #2907	2013-04-18 11:34:58 +02:00
Lucas Ward	99c101c37e	If a value/field is a Calendar, it will be converted to a Date using getTime() Closes #2911	2013-04-18 10:57:08 +02:00
Shay Banon	0eb298fe64	use more aggressive concurrency levels for CHM - long running ones with high update rates - also expose a system property of es.useConcurrentHashMapV8 to use the new non blocking Java8 CHM impl	2013-04-17 14:28:38 -07:00
Shay Banon	271305d5eb	Search Stats: Add current open searches closes #2906	2013-04-16 18:08:57 -07:00
Simon Willnauer	efc9e8fe7b	only return primary if it is active in PlainOperationRounting Closes #2896	2013-04-16 17:20:22 +02:00
Martijn van Groningen	bcc16654d2	Better error messaging when postings_format can be resolved or when a custom postings_format type can't be instantiated. Relates to #2893	2013-04-16 16:29:54 +02:00
Martijn van Groningen	9a1c03408b	Added support for the `_cache` and` _cache_key` options to the `has_child` and `has_parent` filters. Closes #2900	2013-04-16 14:42:45 +02:00
Florian Schilling	ef5b7412e6	Allow PolygonBuilder to create polygons with hole Closes #2899	2013-04-16 11:22:48 +02:00
Simon Willnauer	30f9f278c3	Added UNICODE_CHARACTER_CLASS support to Regex flags. This flag is only supported in Java7 and is ignored if set on a java 6 JVM Closes #2895	2013-04-16 10:06:53 +02:00
uboness	eb21526552	Added missing support for lat, lats, lon, lons for doc notation in scripts	2013-04-13 13:58:30 -07:00
uboness	20e6df9f34	Optimization in fielddata cache where ordinals are used instead of flat arrays when number of unique values is low	2013-04-13 12:42:53 -07:00
Igor Motov	e7b49d8936	Add more dynamic settings validation	2013-04-12 20:55:45 -04:00
Shay Banon	d385e1b356	Clear Cache API: Streamline option names closes #2890	2013-04-12 15:58:24 -07:00
Shay Banon	a2d72697eb	Expose field level field data statistics closes #2889	2013-04-12 15:51:08 -07:00
David Pilato	3b7a195f6f	Add toString() for FilterBuilders Closes #2887.	2013-04-12 22:27:51 +02:00
Martijn van Groningen	bf21466291	CacheTests test fix.	2013-04-12 19:14:38 +02:00
Martijn van Groningen	80dbca0809	Field data: Try to load short values as byte values and load int values as short or byte values to reduce the size they take in memory.	2013-04-12 19:11:18 +02:00
Shay Banon	5fbd4a12a0	fix memory computation for int field data	2013-04-12 08:38:52 -07:00
Martijn van Groningen	5c90e5f940	If no options are specified with the clear cache api then all caches should be cleared. Closes #2886	2013-04-12 15:24:50 +02:00
Igor Motov	00c035f88c	Make sure that settings are propagated to all nodes	2013-04-11 10:59:14 -04:00
Martijn van Groningen	2dfcc3c740	Test that size is actually computed. Relates to #2882	2013-04-11 10:22:48 +02:00
Simon Willnauer	9a2d27a035	rename prefix_length to prefix_len for consistency Closes #2883	2013-04-10 17:39:32 +02:00
Martijn van Groningen	4fd8c2c6d2	Ordinals were omitted from fielddata cache size calculation if field has more than one term. Closes #2882	2013-04-10 14:50:07 +02:00
Martijn van Groningen	637eeacb20	Better error description if field(s) (statistical facet) and value_field (term_stats facet) are not a numeric field	2013-04-10 11:11:52 +02:00
Martijn van Groningen	6a3c53ef44	Should prevent OOM	2013-04-10 10:00:51 +02:00
Martijn van Groningen	b8b28041e5	Fix for extended facets test.	2013-04-10 00:47:00 +02:00
Igor Motov	b0e44a2b40	Fix term counters in script field terms facet Fixes #2878	2013-04-09 12:42:35 -04:00
Simon Willnauer	ae74a8dbb7	Configure FieldData using a hash not a string Closes #2876	2013-04-09 15:53:05 +02:00
Simon Willnauer	374bbbfa7b	# FieldData Filter FieldData is an in-memory representation of the term dictionary in an uninverted form. Under certain circumstances this FieldData representation can grow very large on high-cardinality fields like tokenized full-text. Depending on the use-case filtering the terms that are hold in the FieldData representation can heavily improve execution performance and application stability. FieldData Filters can be applied on a per-segment basis. During FieldData loading the terms enumeration is passed through a filter predicate that either accepts or rejects a term. ## Frequency Filter The Frequency Filter acts as a high / low pass filter based on the document frequencies of a certain term within the segment that is loaded into field data. It allows to reject terms that are very high or low frequent based on absolute frequencies or percentages relative to the number of documents in the segment or more precise the number of document that have at least one value in the field that is loaded in the current segment. Here is an example mapping Here is an example mapping: ```json { "tweet" : { "properties" : { "locale" : { "type" : "string", "fielddata" : "format=paged_bytes;filter.frequency.min=0.001;filter.frequency.max=0.1", "index" : "analyzed", } } } } ``` ### Paramters * `filter.frequency.min` - the minimum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted. * `filter.frequency.max` - the maximum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted. * `filter.frequency.min_segment_size` - the minimum number of documents in a segment in order for the filter to be applied. Small segments might be omitted with this setting. ## Regular Expression Filter The regular expression filter applies a regular expression to each term during loading and only loads terms into memory that match the given regular expression. Here is an example mapping: ```json { "tweet" : { "properties" : { "locale" : { "type" : "string", "fielddata" : "format=paged_bytes;filter.regex=^en_.*", "index" : "analyzed", } } } } ``` Closes #2874	2013-04-09 11:34:48 +02:00
Igor Motov	acc0950957	Get template should return warmers Fixes #2868	2013-04-08 19:12:20 -04:00
Simon Willnauer	a10c80e20f	ensure that modificatons to the enum order trigger test failures since we rely on the ordinal	2013-04-08 23:29:56 +02:00
Simon Willnauer	7e77ddb88f	use enum to represent flags and fail if flags are not respected	2013-04-08 22:56:11 +02:00
Igor Motov	2a588dc1f1	Fix IndexMissingException in get template request Fixes #2873	2013-04-08 16:25:09 -04:00
Shay Banon	3120457bfe	move to 0.90.0.RC3 snap	2013-04-08 05:48:29 -07:00
Shay Banon	3a8cba4d50	release 0.90.0.RC2	2013-04-08 05:46:26 -07:00
Shay Banon	5fa66cd592	Node Stats: Allow to explicitly get specific indices level node stats element closes #2871	2013-04-07 20:22:48 -07:00
Shay Banon	15d7ae5983	FieldData Stats: Add field data stats to indices stats API closes #2870	2013-04-07 18:30:24 -07:00
Martijn van Groningen	86c1714bf3	Also test the `fields` option.	2013-04-07 21:52:19 +02:00
Simon Willnauer	7ad03ed789	Use IndexOption.DOCS_ONLY for boolean fields Closes #2866	2013-04-06 22:41:22 +02:00
Shay Banon	9f6c8c88f3	improve on shard level filter/id cache stats use just the removal listener and back to the IndexReader#coreCacheKey as the actual field as part of the cache key	2013-04-06 00:02:42 +02:00
Shay Banon	815917fbf8	confusing code..., but we can't release the searcher in a get result case we need that searcher later on..., need to think of how to simplify that..., added a comment for now	2013-04-05 23:27:03 +02:00
Simon Willnauer	36ffd6d582	release searcher in finally block rather than relying on an exception that is thrown	2013-04-05 22:45:52 +02:00
Shay Banon	84670212a6	Filter / Id Cache Stats: Add to Indices Stats API, revise node stats API closes #2862	2013-04-05 20:02:32 +02:00
Simon Willnauer	5e7ad9832c	Added more evil tests for different field data implementations	2013-04-05 18:12:50 +02:00
Martijn van Groningen	224faffead	Added an extended test for terms facet with a decent number of documents / field values and randomly tests various options. Also fixed an issue where `regex` and `excludes` were ignored when `all_terms` was used.	2013-04-05 17:38:46 +02:00
David Pilato	4b1ec037f8	Fix test for #2668 .	2013-04-05 15:00:28 +02:00
Martijn van Groningen	9b5c74d43e	Made sure `all_terms` works consistently. In some cases the `all_terms` option was ignored: * Faceting on number based fields. * The `execution_type` was set to `map`. * In the case the `fields` option was used. Closes #2861	2013-04-05 14:27:19 +02:00
Shay Banon	831ea789aa	rename getByOrd to getValueByOrd (to match BytesValues.WithOrdinals) also make it public so it can be used when iterating over ords	2013-04-05 13:56:33 +02:00
Shay Banon	bcc14cde9f	make numeric namings consistent with bytes ones also add the ability to get the ordinals from DoubleValues.WithOrdinals and LongValues.WithOrdinals	2013-04-05 13:33:56 +02:00
David Pilato	36b92be212	List of existing plugins with Node Info API We want to display information about loaded plugins in Node Info API using plugin option: ```sh curl http://localhost:9200/_nodes?plugin=true ``` For example, on a 4 nodes cluster, it could provide the following output: ```javascript { "ok" : true, "cluster_name" : "test-cluster-MacBook-Air-de-David.local", "nodes" : { "lodYfbFTRnmwE6rjWGGyQQ" : { "name" : "node1", "transport_address" : "inet[/172.18.58.139:9300]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9200]", "plugins" : [ ] }, "hJLXmY_NTrCytiIMbX4_1g" : { "name" : "node4", "transport_address" : "inet[/172.18.58.139:9303]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9203]", "plugins" : [ { "name" : "test-plugin", "description" : "test-plugin description", "site" : true, "jvm" : false }, { "name" : "test-no-version-plugin", "description" : "test-no-version-plugin description", "site" : true, "jvm" : false }, { "name" : "dummy", "description" : "No description found for dummy.", "url" : "/_plugin/dummy/", "site" : false, "jvm" : true } ] }, "bnoySsBfTrSzbDRZ0BFHvg" : { "name" : "node2", "transport_address" : "inet[/172.18.58.139:9301]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9201]", "plugins" : [ { "name" : "dummy", "description" : "This is a description for a dummy test site plugin.", "url" : "/_plugin/dummy/", "site" : false, "jvm" : true } ] }, "0Vwil01LSfK9YgRrMce3Ug" : { "name" : "node3", "transport_address" : "inet[/172.18.58.139:9302]", "hostname" : "MacBook-Air-de-David.local", "version" : "0.90.0.Beta2-SNAPSHOT", "http_address" : "inet[/172.18.58.139:9202]", "plugins" : [ { "name" : "test-plugin", "description" : "test-plugin description", "site" : true, "jvm" : false } ] } } } ``` Information are cached for 10 seconds by default. Modify `plugins.info_refresh_interval` property if needed. Setting `plugins.info_refresh_interval` to `-1` will cause infinite caching. Setting `plugins.info_refresh_interval` to `0` will disable caching. Closes #2668.	2013-04-05 11:36:56 +02:00
Simon Willnauer	f3e6fe094a	beef up term facet tests	2013-04-05 11:05:24 +02:00
Simon Willnauer	9fbe075aec	Added test that compares concurrent facet execution results with a serial execution result	2013-04-05 10:36:53 +02:00
Shay Banon	5af6343697	allow to disable the optimization of removal of ords on single value numerics/geo field data field data settings in the mappings can have ordinals=always option	2013-04-05 00:44:07 +02:00
Shay Banon	54f685674b	Thread Pool: Update default settings (move from default cached to fixed) closes #2858	2013-04-04 23:24:49 +02:00
Simon Willnauer	f1dd867c4f	Catch Throwable when listener is called rather then Exception to prevent possible hangs if fatal exceptions or errors are thrown	2013-04-04 22:58:38 +02:00
Shay Banon	a206aa4548	Settings / Config: Allow to explicitly specify external environment variable syntax, in which case its optional fixes #2855	2013-04-04 16:30:24 +02:00
Simon Willnauer	d758401add	Cleanup ScriptDocValues. This commit adds a getValues method to all ScriptDocValues for easy access in scripts via doc['field'].values / value.	2013-04-04 16:07:54 +02:00
Alexander Reelsen	4f96b36376	Returning configuration of root field mappers toXContent method only if they are enabled	2013-04-04 15:55:12 +02:00
Alexander Reelsen	fbdf89c636	Fix for ttl fieldmapper to support disabling correctly. Also returning only booleans, not enums in toXContent	2013-04-04 12:27:23 +02:00
Alexander Reelsen	955788e9a5	Allowing to disable size field mapper after enabling	2013-04-04 09:41:41 +02:00
Alexander Reelsen	e662e4d55d	Allowing to disable index field mapper after enabling	2013-04-04 09:41:41 +02:00
Alexander Reelsen	9cc2563d5e	Allowing to disable timestamp field mapper after enabling	2013-04-04 09:41:41 +02:00
Simon Willnauer	223ec2c42d	Beef up FieldData tests by running one on one duells	2013-04-03 18:38:25 +02:00
Igor Motov	356329df00	Improve stability of ClusterHealthTests	2013-04-03 12:07:42 -04:00
Igor Motov	d2f6349dcf	Improve stability of MinimumMasterNodesTests	2013-04-03 11:51:28 -04:00
Martijn van Groningen	0a89c80554	Fixed issue where a doc is omitted from the hits if it has no geo point and sorting is based on geo distance. Closes #2851	2013-04-03 17:25:16 +02:00
Simon Willnauer	bbe619a416	Call onFailure for every exception case even in the case of an error / runtime exception Closes #2848	2013-04-03 12:25:58 +02:00
Simon Willnauer	eb8b38d027	Upgrade to Lucene 4.2.1	2013-04-03 12:22:39 +02:00
Martijn van Groningen	af2f31c33e	Fixed typo	2013-04-02 22:31:06 +02:00
Martijn van Groningen	cf00acf5b0	If no specified index or alias exists and `ignore_indices` is set to `missing` an index missing error is returned instead of resolving to all open indices (e.g. when searching). This breaks backwards comp. with 0.20.x and before. Closes #2837	2013-04-02 19:06:17 +02:00
Alexander Reelsen	0a466352cd	Add support for creating a fedora RPM package with maven Note: This has been disabled by default and is therefore not included in a standard build. The main reason for this is, that you need to have a RPM binary and the rpm development packages installed, which is not the case on many systems. The package contains an init.d-script as well as systemd configurations. You can build your own RPM package simply by running 'maven rpm:rpm'	2013-04-02 16:19:45 +02:00
Shay Banon	10a76ad5d8	fix seen readers counter since clear can be called on percolator as well, we need to make sure we inc the counter even for non segment readers	2013-04-02 13:25:56 +02:00
Shay Banon	31d1e6cfe7	Field Data: Simplify field data cache settings closes #2843	2013-04-02 12:44:39 +02:00
Alexander Reelsen	d866321c55	Merge pull request #2811 from spinscale/document-mapper-merge Allow to update ttl field mapping after initial creation. Fixes #2136	2013-04-01 23:37:29 -07:00
Simon Willnauer	7efa92636a	Cut over to IntsRef in favor of IntsArrayRef	2013-03-31 10:46:21 +02:00
Simon Willnauer	b3356d9f8d	remove dead code	2013-03-31 10:19:17 +02:00
Simon Willnauer	2a09342405	remove Bytes.java in favor of BytesRef / ArrayUtils	2013-03-31 08:54:39 +02:00
Simon Willnauer	e864d5785e	optimize matcher reset to not create unnecessary string objects	2013-03-30 17:35:23 +01:00
Simon Willnauer	fefa8da2ea	remove StringValues in favor of BytesValues	2013-03-30 17:35:23 +01:00
Simon Willnauer	dff2a9279c	clean-up double values	2013-03-30 17:35:23 +01:00
Simon Willnauer	d5c271acf5	clean-up long values	2013-03-30 17:35:22 +01:00
Simon Willnauer	5aedf74fb0	Remove getValues from numeric and string field data & clean up geo field data	2013-03-30 17:35:22 +01:00
Simon Willnauer	7f81469137	Refactor BytesValues to be reused as the interface for HashedBytesValues and remove HashBytesValues.	2013-03-30 17:35:22 +01:00
Simon Willnauer	129f02623b	Added FST based FieldData implementation holding all data in a per segment FST. This commit factors our a common API for BytesValues based impl to shared code and reduce code duplication.	2013-03-30 17:35:22 +01:00
Shay Banon	72c76c2799	fail on malformed sort	2013-03-30 13:58:39 +01:00
Shay Banon	6a1cb8f61b	{sort: "field"} throws misleading errors fixes #2835	2013-03-30 13:46:53 +01:00
Martijn van Groningen	2e93329e23	If match then go to next doc	2013-03-29 16:57:42 +01:00
Martijn van Groningen	a89dde8bac	Fixed `bool` filter bugs: * In the case only should clauses were specified with specific type of filters, the first clause determined which documents matched. * In some cases the minimum at least 1 should clause should match behaviour was broken.	2013-03-29 16:48:36 +01:00
Igor Motov	b657bdfa1a	Optimize aliases processing Closes# 2832	2013-03-29 10:44:45 -04:00
Alexander Reelsen	a880a6c85e	Allow to update ttl field mapping after initial creation. Fixes #2136 Adding possibility to change TTL field mapper data without specifying enabled flag in mapping update	2013-03-28 17:25:28 +01:00
Martijn van Groningen	941aa17a43	Added sort mode to geo distance sorting. Closes #1846	2013-03-28 17:04:42 +01:00
Igor Motov	9bc50ea609	Fix LeastUsedDistributor and ensure random distribution for multiple non-fs directories If we cannot determine available space the fallback scenario is now to use random distribution instead of always using the last directory. Fixes #2820	2013-03-28 11:08:54 -04:00
Shay Banon	1fc37e5954	Segments API: Add version & compound for each segment closes #2823	2013-03-28 15:34:38 +01:00
Shay Banon	473473e867	remove the field settings for query parser cache, not really relevant	2013-03-27 20:39:36 +01:00
Shay Banon	c18c609af1	Date math in query_string caches now() fixes #2808	2013-03-27 20:32:38 +01:00
Igor Motov	5bb75f9da3	Move applying alias filter to ContextSearch#preProcess()	2013-03-27 09:23:54 -04:00
Simon Willnauer	17f83f33bb	Terminate early when no terms left in the suggest string. Closes #2817	2013-03-26 17:44:34 +01:00
Igor Motov	9ae421a8b2	Fix filtering aliases with non-empty sort options Fixes #2816	2013-03-26 07:23:44 -04:00
Shay Banon	d35a3b03c8	Warmers: Have an explicit warmer thread pool add 1 in case there is 1 core... closes #2815	2013-03-25 23:34:52 +01:00
Simon Willnauer	aa97c031f2	Don't reset tokenstream before passing to the MemoryIndex, otherwise some tokenizer might swallow tokens. Closes #2814	2013-03-25 22:46:11 +01:00
Shay Banon	b7106622d8	Warmers: Have an explicit warmer thread pool Have an explicit threadpool warmer that is dedicated to execute warmers. Currently, it uses the search threadpool, which does not work well since the number of concurrent searches should be separate from the number of concurrent warmers allows, also the characteristics of the search pool (for example, bounded queue_size) might not fit well with how warmers should be executed (they should not be "rejected"). closes #2815	2013-03-25 16:46:37 +01:00
Shay Banon	0e815ce11c	add 0.20.7	2013-03-25 12:33:55 +01:00
adavis	6a93fbcf07	Adding parsing for zero terms query for multi match Tests for multi-match zero_terms_query and making references to the ZeroTermsQuery enum consistent to others used in MultiMatchQueryBuilder	2013-03-23 08:59:39 +01:00
adavis	3f83904680	Fixes java6_u31 compile error w.r.t. type inference	2013-03-22 16:46:42 -07:00
Simon Willnauer	560d2c094e	Fix issue where entire shards are lost due to too many open files exceptions and a but in Lucene's IndexWriter / DirectoryReader where an existsing index was not detected and then deleted due to a wrong creation mode. See LUCENE-4870 Closes #2812	2013-03-22 17:18:55 +01:00
Florian Schilling	1a67793a4b	Added Script test for geo distance tests and modified GeoUtils.normalizePoint()	2013-03-22 13:34:18 +01:00
Simon Willnauer	075779a397	Call onMissing if doc has no value in the field. Closes #2807	2013-03-21 22:45:17 +01:00
Simon Willnauer	064d272916	Respect offset and length when iterating over BytesRef in Uid. The length is starting at offset Closes #2806	2013-03-21 19:29:05 +01:00
Simon Willnauer	5f05c2106f	Use more efficient StemmerOverrideFilter from Lucene trunk Closes #2800	2013-03-21 07:58:51 +01:00
Shay Banon	ea698add72	move to 0.90.0.RC2 snap	2013-03-20 19:06:30 +01:00
Shay Banon	a2f14b68e8	release 0.90.0.RC1	2013-03-20 19:05:08 +01:00
Florian Schilling	f08d458545	# GeoShape Precision The `geo_shape` precision could be only set via `tree_levels` so far. A new option `precision` now allows to set the levels of the underlying tree structures to be set by distances like `50m`. The def ## Example ```json curl -XPUT 'http://127.0.0.1:9200/myindex/' -d '{ "mappings" : { "type1": { "dynamic": "false", "properties": { "location" : { "type" : "geo_shape", "geohash" : "true", "store" : "yes", "precision":"50m" } } } } }' ``` ## Changes - GeoUtils defines the [WGS84](http://en.wikipedia.org/wiki/WGS84) reference ellipsoid of earth - DistanceUnits refer to a more precise definition of earth circumference - DistanceUnits for inch, yard and meter have been defined - Set default levels in GeoShapeFieldMapper to 50m precision Closes #2803	2013-03-20 14:52:47 +01:00
Simon Willnauer	4705eb2959	Lazily initialize the delegate in BloomFilteredPostingsFormat to prevent unnecessary loading if bloomfilter terminates early	2013-03-20 12:43:17 +01:00
Simon Willnauer	747ce36915	Specialise the default codec to reuse Lucene41 files in the common case. Closes #2799	2013-03-20 12:43:17 +01:00
Shay Banon	54e7e309a5	better comment...	2013-03-19 14:36:13 +01:00
Shay Banon	d5beea4bba	if multicast socket closes, try and restart it also, throttle on socket failures, so it won't spin out of control... relates to #2783	2013-03-19 11:20:47 +01:00
Shay Banon	f4a212420b	multicastSocket should be volatile as well...	2013-03-19 10:23:39 +01:00
Shay Banon	c92207f483	broadcast API to by default ignore missing / illegal shard state this happens for example because we list assigned shards, and they might not have been allocated on the relevant node yet, no need to list those as actual failures in some APIs	2013-03-19 10:22:43 +01:00

... 2 3 4 5 6 ...

1782 Commits