OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	a654c3d103	Set a hard limit on the number of tokens we run suggestion on PhraseSuggester can be very slow and CPU intensive if a lot of terms are suggested. Yet, to prevent cluster instabilty and long running requests this commit adds a hard limit of by default 10 tokens where we just return no correction for anymore if the query is parsed into more tokens. Closes #3164	2013-06-13 15:12:38 +02:00
Alexander Reelsen	9d3e34b9f9	Allow date format to supported group of built-in patterns Until now 'named dates' like dateOptionalTime could not be used as a group of dates. This patch allows it to group it arbitrarily like this: * yyyy/MM/dd HH:mm:ss\|\|yyyy/MM/dd\|\|dateOptionalTime * dateOptionalTime\|\|yyyy/MM/dd HH:mm:ss\|\|yyyy/MM/dd * yyyy/MM/dd HH:mm:ss\|\|dateOptionalTime\|\|yyyy/MM/dd * date_time\|\|date_time_no_millis Closes #2132	2013-06-13 15:03:55 +02:00
Martijn van Groningen	015d820e53	Made not found logic easier. Relates #3172	2013-06-13 13:21:36 +02:00
Simon Willnauer	7e2d8f1358	add more verbose assertions to tests	2013-06-13 11:58:28 +02:00
Adrien Grand	c20d44a1ff	Forbid usage of Character.codePoint(At\|Before) and Collections.sort. Character.codePointAt and codePointBefore have two versions: one which only accepts an offset, and one which accepts an offset and a limit. The former can be dangerous when working with buffers of characters because if the offset is the last char of the buffer, a char outside the buffer might be used to compute the code point, so one should always use the version which accepts a limit. Collections.sort is wasteful on random-access lists: it dumps data into an array, sorts the list and then adds elements back to the list. However, the sorting can easily be performed in-place by using Lucene's CollectionUtil.(merge\|quick\|tim)Sort.	2013-06-13 10:14:35 +02:00
Martijn van Groningen	6d8a85c6af	Made get mapping rest response consistent. Closes #3172	2013-06-13 10:11:06 +02:00
Martijn van Groningen	96af4ee44f	Use XConstantScoreQuery instead of ConstantScoreQuery. Relates to #3167	2013-06-13 10:00:54 +02:00
Boaz Leskes	aa851225e5	Added created flag to index related request classes. The flag is set to true when a document is new, false when replacing an existing object. Other minor changes: Fixed an issue with dynamic gc deletes settings update Added an assertThrows to ElasticsearchAssertion Closes #3084 , Closes #3154	2013-06-13 09:10:32 +02:00
Martijn van Groningen	a2de34eead	Added `filter` support to `custom_score` query. Closes #3167	2013-06-12 22:41:49 +02:00
Martijn van Groningen	dc0d81b8aa	Improves the way the get mapping and get warmer get their data from the master's cluster state copy. Both apis now also support a `local` parameter, that fetches the mapping / warmer from the cluster state of the node that received the request. The `type` option in the get mapping api now also support wildcards. The warmer api now also support the `type` option. Closes #3171	2013-06-12 21:03:47 +02:00
Simon Willnauer	8e33e0e69d	Use CFS in any case if index.compound_format is set to true Lucenes MergePolicies support a noCFSRatio. This commit introduces support for this ratio via `index.compound_format`. This setting can parse a boolean value or a value in the interval [0..1] that is equivalent to the noCFSRatio. The setting `1`, `1.0` and `true` are equivalent as well as `0`, `0.0` and `false`. Closes #3166	2013-06-12 20:45:18 +02:00
Simon Willnauer	cb0cf3167c	stabelize more tests	2013-06-12 13:25:26 +02:00
Shay Banon	c449fbdd68	missing/exists filters should also work for objects closes #3141	2013-06-12 04:42:23 +02:00
Shay Banon	f155525cad	upgrade jackson to 2.2.2, netty to 3.6.6	2013-06-11 20:36:08 +02:00
Simon Willnauer	66cd74d2df	Always ceate index with mapping in test to ensure shards are available	2013-06-11 19:08:33 +02:00
Shay Banon	dac2c559d4	remove the index level class support fix the test that relies on it, just index the data for each test case	2013-06-11 16:35:13 +02:00
Shay Banon	78fb12bcaa	fix the type of the mapping	2013-06-11 14:49:34 +02:00
Shay Banon	3a0f9c6ea3	fix shared cluster to delete templates as well per test run	2013-06-11 14:43:18 +02:00
Shay Banon	1d63ff64c7	simplify parsing code	2013-06-11 13:19:54 +02:00
Shay Banon	41e4ee22e6	Thread pool: rename `capacity` to `queue_size` fixes #3161	2013-06-11 13:07:07 +02:00
Simon Willnauer	7afffbe13b	Cleanup String to UTF-8 conversion Currently we have many different places that convert String to UTF-8 bytes and back. We shouldn't maintain more code than necessary to do this conversion and rather use Lucene's support for it.	2013-06-10 21:56:24 +02:00
Alexander Reelsen	9323e677bd	Cleaning up some tests by using assertHitCount assertion	2013-06-10 16:57:09 +02:00
Simon Willnauer	21945e5060	Ensure all shards return compareable scores for rescore tests	2013-06-10 16:50:10 +02:00
Simon Willnauer	314a3343f9	Add more verbose matchers / asserts to tests	2013-06-10 16:06:04 +02:00
Florian Schilling	f64f7c0c08	Fixed the `GeoPointFieldMapper` to parse `geohashes` correctly. Closes #3073	2013-06-10 12:13:43 +02:00
Simon Willnauer	b9feaa9999	Simplify TestCluster TestCluster now doesn't use any reference counting anymore and testcluster names are based on creation time to prevent confilcts if builds hang.	2013-06-10 12:07:11 +02:00
Britta Weber	11d08ac436	term vector request ================================ Returns information and statistics on terms in the fields of a particular document as stored in the index. curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' Tree types of values can be requested: term information, term statistics and field statistics. By default, all term information and field statistics are returned for all fields but no term statistics. Optionally, you can specify the fields for which the information is retrieved either with a parameter in the url curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?fields=text,...' or adding by adding the requested fields in the request body (see example below). Term information ------------------------- - term frequency in the field (always returned) - term positions ("positions" : true) - start and end offsets ("offsets" : true) - term payloads ("payloads" : true), as base64 encoded bytes If the requested information wasn't stored in the index, it will be omitted without further warning. See [mapping](http://www.elasticsearch.org/guide/reference/mapping/core-types/) on how to configure your index to store term vectors. Term statistics ------------------------- Setting "term_statistics" to "true" (default is "false") will return - total term frequency (how often a term occurs in all documents) - document frequency (the number of documents containing the current term) By default these values are not returned since term statistics can have a serious performance impact. Field statistics ------------------------- Setting "field_statistics" to "false" (default is "true") will omit - document count (how many documents contain this field) - sum of document frequencies (the sum of document frequencies for all terms in this field) - sum of total term frequencies (the sum of total term frequencies of each term in this field) Behavior ------------------------- The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. Example ------------------------- First, we create an index that stores term vectors, payloads etc. : curl -s -XPUT 'http://localhost:9200/twitter/' -d '{ "mappings": { "tweet": { "properties": { "text": { "type": "string", "term_vector": "with_positions_offsets_payloads", "store" : "yes", "index_analyzer" : "fulltext_analyzer" }, "fullname": { "type": "string", "term_vector": "with_positions_offsets_payloads", "index_analyzer" : "fulltext_analyzer" } } } }, "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }, "analysis": { "analyzer": { "fulltext_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase", "type_as_payload" ] } } } } }' Second, we add some documents: curl -XPUT 'http://localhost:9200/twitter/tweet/1?pretty=true' -d '{ "fullname" : "John Doe", "text" : "twitter test test test " }' curl -XPUT 'http://localhost:9200/twitter/tweet/2?pretty=true' -d '{ "fullname" : "Jane Doe", "text" : "Another twitter test ..." }' The following request returns all information and statistics for field "text" in document "1" (John Doe): curl -XGET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true' -d '{ "fields" : ["text"], "offsets" : true, "payloads" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true }' Equivalently, all parameters can be passed as URI parameters: curl -GET 'http://localhost:9200/twitter/tweet/1/_termvector?pretty=true&fields=text&offsets=true&payloads=true&positions=true&term_statistics=true&field_statistics=true' Response: { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_version" : 1, "exists" : true, "term_vectors" : { "text" : { "field_statistics" : { "sum_doc_freq" : 6, "doc_count" : 2, "sum_ttf" : 8 }, "terms" : { "test" : { "doc_freq" : 2, "ttf" : 4, "term_freq" : 3, "pos" : [ 1, 2, 3 ], "start" : [ 8, 13, 18 ], "end" : [ 12, 17, 22 ], "payload" : [ "d29yZA==", "d29yZA==", "d29yZA==" ] }, "twitter" : { "doc_freq" : 2, "ttf" : 2, "term_freq" : 1, "pos" : [ 0 ], "start" : [ 0 ], "end" : [ 7 ], "payload" : [ "d29yZA==" ] } } } } } Further changes: ------------------------- XContentBuilder new method public XContentBuilder field(XContentBuilderString name, int offset, int length, int... value) to put an integer array. IndicesAnalysisService make token filter for saving payloads available in elasticsearch AbstractFieldMapper/TypeParser make term vector options string available and also fix the parsing of this string: with_positions_payloads is actually allowed as can be seen in TermVectorsConsumerPerFields. Closes #3114	2013-06-10 11:09:11 +02:00
Simon Willnauer	945b89fd80	Don't test the test - who tests the test for the test? ;)	2013-06-07 20:40:50 +02:00
Simon Willnauer	b222e83d2b	Stabelize more tests	2013-06-07 20:33:17 +02:00
Britta Weber	ac75b1bcae	Fix addMapping() in AbstractSharedClusterTest for more than one field	2013-06-07 19:05:13 +02:00
Alexander Reelsen	a5f9173e14	Making deb installable by being lintian compatible According to #2515 the ubuntu software center does not allow to install debian packages which are not lintian compatible I worked on the package and made it lintian compatible by doing * Ignoring errors about arch dependent binaries as we will not split this package. The arch dependent libraries are used correctly. * Added a copyright file pointing to the apache license in debian Closes #2515 Closes #2320	2013-06-07 13:53:14 +02:00
Simon Willnauer	962e3d58f7	Added shortcuts for several common commands added simple way to add more complex mappings as well as shortcuts for flush and status etc. all checking if requests return failed shards	2013-06-07 12:30:30 +02:00
Martijn van Groningen	8016d32a0e	Fixed minor issue in ASCT#indexExists(...)	2013-06-06 21:42:42 +02:00
Martijn van Groningen	e218ead19e	ChildrenQuery and ParentQuery now take into account documents that have been marked. Closes #3144	2013-06-06 17:13:49 +02:00
Simon Willnauer	3b01f812d6	Stabelize more tests Wait for relocation before checking statistics or run refresh / optimze.	2013-06-06 17:03:36 +02:00
Simon Willnauer	1c513bc262	Fallback to extract terms if MultiPhraseQuery is large Currently if MPQ is very large highlighing can take down a node or cause high CPU / RAM consumption. If the query grows > 16 terms we just extract the terms and do term by term highlighting. Closes #3142 #3128	2013-06-06 11:22:49 +02:00
Simon Willnauer	f995c9c130	Correct offsets in FVH also if stored field is used for highlighting The SimpleFragemntsBuilder did not correct offsets if the used analysis chais could produce broken offsets that could lead to StringArrayIndexOutOfBounds Exceptions Closes #3140	2013-06-06 10:23:09 +02:00
Simon Willnauer	00c13532a9	report details if shard response has failed shards	2013-06-06 00:54:34 +02:00
Martijn van Groningen	7936417270	Added a benchmark for parent/child queries while indexing at the same time.	2013-06-05 22:27:18 +02:00
Martijn van Groningen	82ff1c6802	Fixed `has_parent` query and filter returning no results with multi level child docs.	2013-06-05 22:12:26 +02:00
Simon Willnauer	56dfa96851	More test cleanups	2013-06-05 15:45:03 +02:00
Simon Willnauer	23ad8401d0	Fix SearchStatsTest Use actual node in test instead of the first node in the array	2013-06-05 09:47:37 +02:00
Simon Willnauer	4ff471ff82	Stabelize more failing tests. - SimpleSortTests#testSortScript which was not using the mapping correctly - SearchStatsTests#testSimpleStats which didn't clear the stats before running the test and a previous run could have added queries	2013-06-04 08:32:48 +02:00
Adrien Grand	85f54edf66	Fix AbstractSimpleEngineTests versioning tests. Version is now stored on a distinct field, that AbstractSimpleEngineTests didn't correctly add before running tests. This generated a test failure when the version needed to be loaded from the index.	2013-06-04 00:58:54 +02:00
Simon Willnauer	07546d4d8d	Stabeilze SearchStatsTests Query stats are only present (not 0) on nodes that hold a shard of the index.	2013-06-03 15:05:57 +02:00
Christoph Kempen	9f43814a86	Changed Java dependency from Depends to Suggest. Since people are using the Oracle JAVA distribution and not the OpenJDK. You can suggest it of course. Now the installation will at least continue. If the init script is called, it will exit with a useful error message, that no JDK is available via the JAVA_HOME variable.	2013-06-02 15:09:29 +02:00
Alexander Reelsen	609ad0e572	Changing version semantics to be more readable The Version class had hard to understand semantics when two versions were compared against each other. Sample of the new logic: * V_0_20_0.before(V_0_90_0) => true * V_0_90_0.after(V_0_20_0) => true Closes #3124	2013-06-02 14:58:36 +02:00
Simon Willnauer	3417b945dd	stabelize SimpleQueryTest	2013-06-02 13:02:36 +02:00
Simon Willnauer	a3f4d33aaa	Stabelize MoreLikeThisActionTest Ensure test sends mapping with createIndex	2013-06-02 10:45:30 +02:00
Simon Willnauer	a5837b0f8d	Stableize SearchStatsTest after search refactoring SearchStatsTest depends on a given set of nodes and shards. The test needed to be adjusted to reflect a possibly random number of nodes.	2013-06-02 10:04:47 +02:00

1 2 3 4 5 ...

5005 Commits All Branches Search

5005 Commits

All Branches