OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	f2dc4f810c	Added tests for malformed mappings with no root object This commit also makes the error message more consistent with other exception messages in the DocumentMapperParser.	2013-08-07 14:01:32 +02:00
Manuel Bernhardt	27518b5e41	Improved error message when the mapping document is malformed	2013-08-07 13:41:49 +02:00
Simon Willnauer	7f0115ba9a	Return nothing instead of everything in MLT if no field is supported. Today due the optimizations in the boolean query builder we adjust a pure negative query with a 'match_all'. This is not the desired behavior in the MLT API if all the fields in a document are unsupported. If that happens today we return all documents but the one MLT is executed on. Closes #3453	2013-08-07 13:25:09 +02:00
Martijn van Groningen	73c038fb48	Improved filtering by _parent field In the _parent field the type and id of the parent are stored as type#id, because of this a term filter on the _parent field with the parent id is always resolved to a terms filter with a type / id combination for each type in the mapping. This can be improved by automatically use the most optimized filter (either term or terms) based on the number of parent types in the mapping. Also added support to use the parent type in the term filter for the _parent field. Like this: ```json { "term" : { "_parent" : "parent_type#1" } } ``` This will then always automatically use the term filter. Closes #3454	2013-08-07 13:20:21 +02:00
Martijn van Groningen	5e0b1621b4	added Lucene upgrade reminder	2013-08-07 10:46:25 +02:00
Martijn van Groningen	12c7eeb262	Added `size` option to percolate api The `size` option in the percolate api will limit the number of matches being returned: ```bash curl -XGET 'localhost:9200/my-index/my-type/_percolate' -d '{ "size" : 10, "doc" : {...} }' ``` In the above request no more than 10 matches will be returned. The `count` field will still return the total number of matches the document matched with. The `size` option is not applicable for the count percolate api. Closes #3440	2013-08-07 10:27:20 +02:00
Simon Willnauer	662bb80d6b	Add binary protocol backwards compatibility for suggest highlights This change requires different request processing on the binary protocol level since it has been we provide compatibilty across minor version. Yet, the suggest feature is still experimental but we try best effort to make upgrades as seamless as possible.	2013-08-07 10:19:11 +02:00
Luca Cavanna	3574d9de49	added explicit creation of parent type in create index	2013-08-06 23:10:33 +02:00
Nik Everett	72d6d822ae	Add highlighting support for suggester. This commit adds general highlighting support to the suggest feature. The only implementation that implements this functionality at this point is the phrase suggester. The API supports a 'pre_tag' and a 'post_tag' that are used to wrap suggested parts of the given user input changed by the suggester. Closes #3442	2013-08-06 20:57:39 +02:00
Britta Weber	a938bd57a9	add assertion for cast double->float ScoreFunction scoring might result in under or overflow, for example if a user decides to use the timestamp as a boost in the script scorer. Therefore, check if cast causes a huge precision loss. Note that this does not always detect casting issues. For example in ScriptFunction.score() the function SearchScript.runAsDouble() is called. AbstractFloatSearchScript implements it as follows: @Override  public double runAsDouble() {  return runAsFloat();  } In this case the cast happens before the assertion and therfore precision lossor over/underflows cannot be detected by the assertion.	2013-08-06 18:39:36 +02:00
Britta Weber	e707308f1f	Distance scoring ================ It might sometimes be desirable to have a tool available that allows to multiply the original score for a document with a function that decays depending on the distance of a numeric field value of the document from a user given reference. These functions could be computed for several numeric fields and eventually be combined as a sum or a product and multiplied on the score of the original query. This commit adds new score functions similar to boost factor and custom script scoring, that can be used togeter with the <code>function_score</code> keyword in a query. To use distance scoring, the user has to define 1. a reference and 2. a scale for each field the function should be applied on. A reference is needed to define a distance for the document and a scale to define the rate of decay. Example use case ---------------- Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in. You would like the query results that match your criterion (for example, "hotel, Berlin, non-smoker") to be scored with respect to distance to the town center and also the price. Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel. In this case your reference for the location field is the town center and the scale is ~2km. If your budget is low, you would probably prefer something cheap above something expensive. For the price field, the reference would be 0 Euros and the scale depends on how much you are willing to pay, for example 20 Euros. Usage ---------------- The distance score functions can be applied in two ways: In the most simple case, only one numeric field is to be evaluated. To do so, call <code>function_score</code>, with the appropriate function. In the above example, this might be: curl 'localhost:9200/hotels/_search/' -d '{ "query": { "function_score": { "gauss": { "location": { "reference": [ 52.516272, 13.377722 ], "scale": "2km" } }, "query": { "bool": { "must": { "city": "Berlin" } } } } } }' which would then search for hotels in berlin with a balcony and weight them depending on how far they are from the Brandenburg Gate. If you have more that one numeric field, you can combine them by defining a series of functions and filters, like, for example, this: curl 'localhost:9200/hotels/_search/' -d '{ "query": { "function_score": { "functions": [ { "filter": { "match_all": {} }, "gauss": { "location": { "reference": "11,12", "scale": "2km" } } }, { "filter": { "match_all": {} }, "linear": { "price": { "reference": "0", "scale": "20" } } } ], "query": { "bool": { "must": { "city": "Berlin" } } }, "score_mode": "multiply" } } }' This would effectively compute the decay function for "location" and "price" and multiply them onto the score. See <code> function_score</code> for the different options for combining functions. Supported fields ---------------- Only single valued numeric fields, including time and geo locations, are be supported. What is a field is missing? ---------------- Is the numeric field is missing in the document, that field will not be taken into account at all for this document. The function value for this field is set to 1 for this document. Suppose you have two hotels both of which are in Berlin and cost the same. If one of the documents does not have a "location", this document would get a higher score than the document having the "location" field set. To avoid this, you could, for example, use the exists or the missing filter and add a custom boost factor to the functions. … "functions": [ { "filter": { "match_all": {} }, "gauss": { "location": { "reference": "11, 12", "scale": "2km" } } }, { "filter": { "match_all": {} }, "linear": { "price": { "reference": "0", "scale": "20" } } }, { "boost_factor": 0.001, "filter": { "bool": { "must_not": { "missing": { "existence": true, "field": "coordinates", "null_value": true } } } } } ], ... Closes #3423	2013-08-06 18:37:55 +02:00
Britta Weber	720b550a94	Unify custom scores =================== The custom boost factor, custom script boost and the filters function query all do the same thing: They take a query and for each found document compute a new score based on the query score and some script, come custom boost factor or a combination of these two. However, the json format for these three functionalities is very different. This makes it hard to add new functions. This commit introduces one keyword <code>function_score</code> for all three functions. The new format can be used to either compute a new score with one function: "function_score": { "(query\|filter)": {}, "boost": "boost for the whole query", "function": {} } or allow to combine the newly computed scores "function_score": { "(query\|filter)": {}, "boost": "boost for the whole query", "functions": [ { "filter": {}, "function": {} }, { "function": {} } ], "score_mode": "(mult\|max\|...)" } <code>function</code> here can be either "script_score": { "lang": "lang", "params": { "param1": "value1", "param2": "value2" }, "script": "some script" } or "boost_factor" : number New custom functions can be added via the function score module. Changes --------- The custom boost factor query "custom_boost_factor" : { "query" : { .... }, "boost_factor" : 5.2 } becomes "function_score" : { "query" : { .... }, "boost_factor" : 5.2 } The custom script score "custom_score" : { "query" : { .... }, "params" : { "param1" : 2, "param2" : 3.1 }, "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)" } becomes "custom_score" : { "query" : { .... }, "script_score" : { "params" : { "param1" : 2, "param2" : 3.1 }, "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)" } } and the custom filters score query "custom_filters_score" : { "query" : { "match_all" : {} }, "filters" : [ { "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } }, "boost" : "3" }, { "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } }, "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)" } ], "score_mode" : "first", "params" : { "param1" : 2, "param2" : 3.1 } "score_mode" : "first" } becomes: "function_score" : { "query" : { "match_all" : {} }, "functions" : [ { "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } }, "boost" : "3" }, { "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } }, "script_score" : { "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)", "params" : { "param1" : 2, "param2" : 3.1 } } } ], "score_mode" : "first", } Partially closes issue #3423	2013-08-06 18:37:34 +02:00
Luca Cavanna	e1c739fe6f	Improved test, printed out potential shard failures	2013-08-06 16:24:29 +02:00
Alexander Reelsen	0db2db612b	RPM Init script bugfix, which might prevent startup Removing dangerous set calls, which might not set back the current state, but something invalid which leads to stop the script when proceeding	2013-08-06 16:19:53 +02:00
Luca Cavanna	a3071540d7	Added support for readable_format parameter when printing out time and size values The following are the API affected by this change and support now the readable_format flag (default false when not specified): - indices segments - indices stats - indices status - cluster nodes stats - cluster nodes info Closes #3432	2013-08-06 16:08:47 +02:00
Shay Banon	ebb4bcd45e	add 0.90.4	2013-08-06 15:28:02 +02:00
Alexander Reelsen	68b77c1ae3	Included only runtime dependencies when copying This makes sure, that no test dependencies are placed in the distribution	2013-08-06 15:13:25 +02:00
Martijn van Groningen	fec196b8d8	Better check for verifying that the _percolator type is removed	2013-08-06 14:20:36 +02:00
Boaz Leskes	43e374f793	Maxing out retries on conflict in bulk update cause null pointer exceptions Also: Bulk update one less retry then requested Document for retries on conflict says it default to 1 (but default is 0) TransportShardReplicationOperationAction methods now catches Throwables instead of exceptions Added a little extra check to UpdateTests.concurrentUpdateWithRetryOnConflict Closes #3447 & #3448	2013-08-06 13:06:06 +02:00
Luca Cavanna	636c35d0d4	Added missing metadata fields to upserted documents (parent, routing, ttl, timestamp, version and versionType) Closes #3444	2013-08-06 12:00:44 +02:00
Simon Willnauer	88a0e4628a	Catch RejectedExecutionException in outer ping request	2013-08-05 23:33:38 +02:00
Martijn van Groningen	a237eead55	If the _percolator has been removed then also remove percolator queries.	2013-08-05 18:43:11 +02:00
Simon Willnauer	1983a3676a	Use domain specific assertions for shard failures across tests	2013-08-05 17:50:24 +02:00
Simon Willnauer	df747836d8	Use busy sleeps in NoMasterNodeTests The busy sleep is less prone to slow tests / machines while still fails if the actual condition isn't met.	2013-08-05 16:50:45 +02:00
Simon Willnauer	d949f67241	Add better assertion reporting if nodes are not present in the ClusterState	2013-08-05 15:40:54 +02:00
Martijn van Groningen	e55dab94ea	the ttl purger might have already deleted the documents.	2013-08-05 14:22:47 +02:00
Shay Banon	d7922b8554	Streamline Search / Broadcast (count, suggest, refresh, ...) APIs header closes #3441	2013-08-05 12:55:38 +02:00
Simon Willnauer	539ffb9ef5	Fix occasionally hanging test moving away from timeouts. Fixes EsExecutorTests to use latches and a busy wait util from ElasticsearchTestCase. This commit also adds some minor randomization to the test.	2013-08-05 11:43:48 +02:00
Simon Willnauer	094c10d62d	Added busy waiting util and add suite timeout. Some rare tests require to busy-wait a short time until a given condition occurs for instance until a threadpool scaled down the number of threads. This commit adds a util that waits a give time until a condition is met, in contrast to Thread.sleep this method waits increases the wait time by doubleling the waiting time iterativly by doubeling it to prevent fast tests to always wait a given sleep interval. This commit also adds a suite timeout to fail a test if the test times out. The test infrastructure will provide thread stack traces if the timeout kicks in. The default timeout is set to 1h.	2013-08-05 11:43:47 +02:00
Alexander Reelsen	9c7a87f118	Overwriting pidfile on startup The current implementation does not overwrite, but only prepend the new PID into the pidfile. So if the process is 4 digits long, but the file is already there with a 5 digit number, the file will contain 5 digits after the write. Note: If the pidfile still exists this usually means, there either is already an instance running using this pidfile or the process has not finished correctly. Closes #3425	2013-08-05 11:28:37 +02:00
Alexander Reelsen	94d3e27940	Added index templates REST support for HEAD and proper 404 * Added HEAD support for index templates to find out of they exist * Returning a 404 instead of a 200 if a GET hits on a non-existing index template Closes #3434	2013-08-04 13:51:34 +02:00
Lukas Vlcek	f2168d32c1	Make (HighlightBuilder\|SearchContextHighlight).Field consistent Update HighlightBuilder.Field API, it should allow for the same API as SearchConstextHighlight.Field. In other words, what is possible to setup using DSL in highlighting at the field level is also possible via the Java API. Closes #3435	2013-08-02 22:01:35 +02:00
Martijn van Groningen	5cf429d144	Wait for green status when index is created	2013-08-02 20:56:54 +02:00
Simon Willnauer	263c5808bb	Don't cache BytesRef in ThreadLocal	2013-08-02 20:30:52 +02:00
Luca Cavanna	85b7efa08b	Added support for named filters in top-level filter Closes #3097	2013-08-02 17:13:46 +02:00
Martijn van Groningen	bd324676bc	Removed AliasMissingException, get alias api will now just return an empty map. In the rest layer a 404 is returned when map is empty.	2013-08-02 17:10:16 +02:00
Martijn van Groningen	1f71890e10	Use assertions that print out shard failures, if there are any	2013-08-02 16:31:00 +02:00
Shay Banon	1a6514c413	mark bool field type as not toknized even though we use keyword analyzer for the bool type, we should mark it as not tokenized in the lucene field type as well, no reason to take it though analysis phase to begin with	2013-08-02 14:44:00 +02:00
Simon Willnauer	012d47b500	Use debug logging rather than info for rejected ping task This exception is thrown on node shutdown and doesn't indicate an critical situation but rather is caught for consistency reasons.	2013-08-02 14:10:55 +02:00
Martijn van Groningen	890d06f018	Added count percolate api Added a new percolate api that only returns the number of percolate queries that have matched with the document being percolated. The actual query ids are not included. The percolate total count will be put in the total field and is the only result that will be returned from the dedicated count apis. The total field will also be included in the already existing percolate and percolating existing document apis and are equal to the number of matches. Closes #3430	2013-08-02 12:30:20 +02:00
Simon Willnauer	2a211705a3	Catch and Log RejectedExecutionException in async ping	2013-08-02 11:32:15 +02:00
Shay Banon	a8dcfa5deb	Search on a shard group while relocation final flip happens might fail single shard read operations should have the same override exception logic as search and broadcast relates to #3427	2013-08-02 09:56:56 +02:00
Alexander Reelsen	343871fcf5	Allow bin/plugin to set -D JVM parameters Currently the bin/plugin command did not allow one to set jvm parameters for startup. Usually this parameters are not needed (no need to configure heap sizes for such a short running process), but one could not set the configuration path. And that one is important for plugins in order find out, where the plugin directory is. This is especially problematic when elasticsearch is installed as debian/rpm package, because the configuration file is not placed in the same directory structure the plugin shell script is put. This pull request allows to call bin/plugin like this bin/plugin -Des.default.config=/etc/elasticsearch/elasticsearch.yml -install mobz/elasticsearch-head As a last small improvement, the PluginManager now outputs the directort the plugin was installed to in order to avoid confusion. Closes #3304	2013-08-02 09:19:57 +02:00
Shay Banon	235b3a3635	Search on a shard group while relocation final flip happens might fail make sure relocation shards add their corresponding initializing shard routing when search across initializing shards also, make shardFailures lazy again closes #3427	2013-08-02 00:20:10 +02:00
Shay Banon	ebda203ce6	less agreesive timeout to catch it on the pending check	2013-08-01 19:52:37 +02:00
Shay Banon	192025401b	improve test to wait for nodes before getting the local node id	2013-08-01 19:45:08 +02:00
Shay Banon	f3d3a8bd58	Search on a shard group while relocation final flip happens might fail At the final stage of a relocation, during the final flip of the states, a search request might hit a node that would then execute it on a shard that has already relocated. For this, we need to execute broadcast and search operations against initializing shards as well, but only as a last resort. The operation will be rejected if not applicable (i.e. IndexShard#searcher() checked for read allowed). Note, this requires careful though about which failures we send back. If we try and initializing shard and it fails, its failure should not override an actual failure of an active shard. Also, removed an atomic integer used in broadcast request and use a similar shard index trick we now have in our search execution. closes #3427	2013-08-01 18:35:58 +02:00
Luca Cavanna	60bddc28eb	Modified test to make failures clearer Added shard failure check when sorting on unmapped field, could be any SearchPhaseExecutionException otherwise (e.g. missing shards)	2013-08-01 17:07:52 +02:00
Simon Willnauer	f2f70a415a	Take fragile test out of the loop UpdateNumberOfReplicasTests#simpleUpdateNumberOfReplicasTests is very fragile due to executing searches based on dated knowledge of the cluster state and calling shards that have been relocating away in the mean time. A fix is on the way.	2013-08-01 15:40:04 +02:00
Martijn van Groningen	300db594aa	Run refresh before executing non realtime get	2013-08-01 15:12:15 +02:00

... 5 6 7 8 9 ...

5507 Commits All Branches Search

5507 Commits

All Branches