OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	f5331c9535	Cleanup NumericFieldData. FieldData interfaces are reduced to long and double while internal represenations still operate on the actual datatypes.	2013-02-08 20:58:36 +01:00
Martijn van Groningen	1189a2c2c2	Extended mv sorting integration test	2013-02-08 15:24:56 +01:00
Martijn van Groningen	8c7779057c	Added sort by field that have multiple values per document. Closes #2634	2013-02-08 13:28:40 +01:00
Simon Willnauer	033d6e4306	don't use substraction for comparison if datatypes can overflow	2013-02-08 10:07:31 +01:00
Martijn van Groningen	f97021b165	Fixes size assertion failure.	2013-02-07 16:50:54 +01:00
Martijn van Groningen	e2cb7edb08	Added more info to assert	2013-02-07 13:52:25 +01:00
Martijn van Groningen	e72e323c8a	Attempt to fix "No active shards" failure	2013-02-07 10:14:10 +01:00
Lee Hinman	ed43ad07d7	Throw a more meaningful message when no document is specified for indexing	2013-02-06 22:33:02 +01:00
Florian Schilling	a52e01f3e5	Remove XTermsFilter and UidFilter in favour of Lucene 4.1 TermsFilter	2013-02-06 18:45:05 +01:00
Igor Motov	6890c9fa62	Move action.wait_on_mapping_change setting to pom	2013-02-06 11:48:58 -05:00
Igor Motov	ed09ba0a18	Improve stability of RecoveryPercolatorTests Without "action.wait_on_mapping_change" setting set to true, the test node might get shutdown before updated mapping is saved.	2013-02-05 14:53:46 -05:00
Igor Motov	8277833f8d	Fix settings processing in WordDelimiterTokenFilterFactory	2013-02-05 10:03:00 -05:00
Martijn van Groningen	19295280d9	Made sure that wrapped child query / parent query gets rewritten only once.	2013-02-05 10:27:31 +01:00
Igor Motov	9e89323ad2	Add proper cleanup to InternalSettingsPerparerTests	2013-02-04 19:58:40 -05:00
Martijn van Groningen	bc667c378e	Made SoftWrapper fields final.	2013-02-04 14:47:36 +01:00
Martijn van Groningen	8109d13733	Use CacheRecycler when resolving parent docs in TopChildrenQuery.	2013-02-04 12:46:30 +01:00
Martijn van Groningen	9c3a86875b	Removed `execution_type` for has_child and has_parent.	2013-02-04 11:37:40 +01:00
Igor Motov	20ce01bd53	Add additional query validation to the terms query parser Fixes #2608	2013-02-03 09:44:16 -05:00
Shay Banon	ebc0c8cc6d	when we fix maxMergeAtOnce, make sure to not set it to 1 as its an illegal value	2013-02-01 19:00:01 +01:00
Shay Banon	a8c9e580ed	add getMaxOrd, and properly document the difference between it and numOrds	2013-02-01 16:13:13 +01:00
Shay Banon	6f1932ab67	support yaml detection on char sequence	2013-02-01 12:46:19 +01:00
Simon Willnauer	6468c15446	check for == 0 rather than > 0	2013-02-01 11:11:47 +01:00
Simon Willnauer	c18ae4a194	fix getMemorySizeInBytes in SparseMultiArrayOrdinals	2013-02-01 11:09:09 +01:00
Igor Motov	45b2bff8da	Improve SearchStatsTests Added refresh to guarantee that at least something will be fetched on a fast computer.	2013-01-31 21:19:08 -05:00
Igor Motov	ca635deb36	Allow health to be executed on a local node instead of the master	2013-01-31 21:19:08 -05:00
Igor Motov	3c9541dd14	Make facet and sort tests more reliable in case of multiple nodes and shards Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.	2013-01-31 21:19:07 -05:00
Igor Motov	6a01e7882c	Improve shardsCleanup test When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.	2013-01-31 21:18:14 -05:00
Igor Motov	e32efba3d8	Improve RecoverAfterNodes tests	2013-01-31 20:05:55 -05:00
Martijn van Groningen	5e811e5382	Another small TopChildrenQuery cleanup.	2013-01-31 23:49:32 +01:00
Martijn van Groningen	7ef65688cd	- TopChildrenQuery cleanup. - Added class level jdocs for TopChildrenQuery and ChildrenQuery.	2013-01-31 23:38:09 +01:00
Simon Willnauer	1a1df06411	Move OrdsBuilding into a dedicated class and abstract integer pools used to build sparse ordinals	2013-01-31 19:02:31 +01:00
Martijn van Groningen	1f50b07406	Initial parent/child queries cleanup.	2013-01-31 18:39:31 +01:00
Martijn van Groningen	371b071fb7	Added notion of Rewrite that replaces ScopePhase	2013-01-31 17:24:46 +01:00
Martijn van Groningen	d4ef4697d5	Also remove scope from facet builders. Fixes build.	2013-01-31 16:34:45 +01:00
Martijn van Groningen	46dd42920c	Remove scope support in query and facet dsl. Remove support for the `scope` field in facets and `_scope` field in the nested and parent/child queries. The scope support for nested queries will be replaced by the `nested` facet option and a facet filter with a nested filter. The nested filters will now support the a `join` option. Which controls whether to perform the block join. By default this enabled, but when disabled it returns the nested documents as hits instead of the joined root document. Search request with the current scope support. ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } }, "_scope" : "my_scope" } }, "facets" : { "size" : { "terms" : { "field" : "offers.size" }, "scope" : "my_scope" } } }' ``` The following will be functional equivalent of using the scope support: ``` curl -s -XPOST 'localhost:9200/products/_search?search_type=count' -d '{ "query" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } } } }, "facets" : { "size" : { "terms" : { "field" : "offers.size" }, "facet_filter" : { "nested" : { "path" : "offers", "query" : { "match" : { "offers.color" : "blue" } }, "join" : false } }, "nested" : "offers" } } }' ``` The scope support for parent/child queries will be replaced by running the child query as filter in a global facet. Search request with the current scope support: ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "has_child" : { "type" : "offer", "query" : { "match" : { "color" : "blue" } }, "_scope" : "my_scope" } }, "facets" : { "size" : { "terms" : { "field" : "size" }, "scope" : "my_scope" } } }' ``` The following is the functional equivalent of using the scope support with parent/child queries: ``` curl -s -XPOST 'localhost:9200/products/_search' -d '{ "query" : { "has_child" : { "type" : "offer", "query" : { "match" : { "color" : "blue" } } } }, "facets" : { "size" : { "terms" : { "field" : "size" }, "global" : true, "facet_filter" : { "term" : { "color" : "blue" } } } } }' ``` Closes #2606	2013-01-31 15:09:57 +01:00
Martijn van Groningen	355381962b	Use only the 'test' index, instead of all indices for child search benchmark.	2013-01-31 13:12:33 +01:00
Shay Banon	6cec73c201	remove fuzzy factor from mapping (internally implemented) we want to support ~ notion in query parser for types other than strings, we are getting there, one can do now age:10~5, we would love to support it for dates, as in timestamp:2012-10-10~5d, but that requires changes in the query parser to support strings after the ~ sign	2013-01-31 12:23:03 +01:00
Igor Motov	8df7f2af0d	Improve testReusePeerRecovery test	2013-01-30 19:51:41 -05:00
Igor Motov	29f4274213	Add index cleanup if index creation fails Fixes #2590	2013-01-30 10:40:01 -05:00
Shay Banon	5c40c97e6e	Id Cache: Allow to configure if ids should be reused (memory wise) or not, default to false closes #2605	2013-01-30 14:42:07 +01:00
Martijn van Groningen	bc20f068c9	Made `search_analyzer` updateable via put mapping api. Closes #2604	2013-01-30 11:49:20 +01:00
Martijn van Groningen	e074e00f76	Fielddata: Moved the growing logic to IntArrayRef	2013-01-30 11:20:41 +01:00
Martijn van Groningen	f7692aeef2	Fielddata: IntArrayRef is initialized with small array and grows if needed	2013-01-30 10:57:52 +01:00
Simon Willnauer	5df37eaf75	add more advanced tests for phrase_prefix	2013-01-30 10:51:05 +01:00
Shay Banon	f5e55b7cb9	properly print JVM version	2013-01-29 20:25:13 +01:00
Shay Banon	0568284147	reduce the memory needed while building the sparse array ordinals	2013-01-29 20:23:54 +01:00
Shay Banon	716f2aebbb	add 0.20.5	2013-01-29 10:14:25 +01:00
Simon Willnauer	0697e2f23e	use index prefix in tests to prevent misconfiguration	2013-01-28 15:51:06 +01:00
Simon Willnauer	72a2416a8c	Support MultiPhrasePrefixQuery and MultiPhraseQuery in highlighters Closes #2596	2013-01-28 15:41:25 +01:00
Martijn van Groningen	2e68207d6d	Updated suggest api. # Suggest feature The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available from version `0.21.0`. # Fuzzy suggester The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request. # Suggest API The suggest request part is defined along side the query part as top field in the json request. ``` curl -s -XPOST 'localhost:9200/_search' -d '{ "query" : { ... }, "suggest" : { ... } }' ``` Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. Both `my-suggest-1` and `my-suggest-2` suggestions use the `fuzzy` suggester, but have a different `text`. ``` "suggest" : { "my-suggest-1" : { "text" : "the amsterdma meetpu", "fuzzy" : { "field" : "body" } }, "my-suggest-2" : { "text" : "the rottredam meetpu", "fuzzy" : { "field" : "title", } } } ``` The below suggest response example includes the suggestion response for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains entries. Each entry is effectively a token from the suggest text and contains the suggestion entry text, the original start offset and length in the suggest text and if found an arbitary number of options. ``` { ... "suggest": { "my-suggest-1": [ { "text" : "amsterdma", "offset": 4, "length": 9, "options": [ ... ] }, ... ], "my-suggest-2" : [ ... ] } ... } ``` Each options array contains a option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance. ``` "options": [ { "text": "amsterdam", "freq": 77, "score": 0.8888889 }, ... ] ``` # Global suggest text To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is defined globally and applies to the `my-suggest-1` and `my-suggest-2` suggestions. ``` "suggest" : { "text" : "the amsterdma meetpu" "my-suggest-1" : { "fuzzy" : { "field" : "title" } }, "my-suggest-2" : { "fuzzy" : { "field" : "body" } } } ``` The suggest text can in the above example also be specied as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level. # Other suggest example. In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase. ``` curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{ "suggest" : { "my-title-suggestions-1" : { "text" : "devloping distibutd saerch engies", "fuzzy" : { "size" : 3, "field" : "title" } } } }' ``` The above request could yield the response as stated in the code example below. As you can see if we take the first suggested options of each suggestion entry we get `developing distributed search engines` as result. ``` { ... "suggest": { "my-title-suggestions-1": [ { "text": "devloping", "offset": 0, "length": 9, "options": [ { "text": "developing", "freq": 77, "score": 0.8888889 }, { "text": "deloping", "freq": 1, "score": 0.875 }, { "text": "deploying", "freq": 2, "score": 0.7777778 } ] }, { "text": "distibutd", "offset": 10, "length": 9, "options": [ { "text": "distributed", "freq": 217, "score": 0.7777778 }, { "text": "disributed", "freq": 1, "score": 0.7777778 }, { "text": "distribute", "freq": 1, "score": 0.7777778 } ] }, { "text": "saerch", "offset": 20, "length": 6, "options": [ { "text": "search", "freq": 1038, "score": 0.8333333 }, { "text": "smerch", "freq": 3, "score": 0.8333333 }, { "text": "serch", "freq": 2, "score": 0.8 } ] }, { "text": "engies", "offset": 27, "length": 6, "options": [ { "text": "engines", "freq": 568, "score": 0.8333333 }, { "text": "engles", "freq": 3, "score": 0.8333333 }, { "text": "eggies", "freq": 1, "score": 0.8333333 } ] } ] } ... } ``` # Common suggest options: * `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion. # Common fuzzy suggest options * `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion. * `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field. * `size` - The maximum corrections to be returned per suggest text token. * `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value: `score` - Sort by sore first, then document frequency and then the term itself. `frequency` - Sort by document frequency first, then simlarity score and then the term itself. * `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified: `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default. `popular` - Only suggest suggestions that occur in more docs then the original suggest text term. ** `always` - Suggest any matching suggestions based on terms in the suggest text. # Other fuzzy suggest options: * `lowercase_terms` - Lower cases the suggest text terms after text analyzation. * `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2. * `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms. * `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4. * `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise. * `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5. * `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option. * `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option.	2013-01-28 15:18:18 +01:00

... 2 3 4 5 6 ...

4549 Commits All Branches Search

4549 Commits

All Branches