Commit Graph

4549 Commits

Author SHA1 Message Date
Simon Willnauer f5331c9535 Cleanup NumericFieldData. FieldData interfaces are reduced to long and double while internal
represenations still operate on the actual datatypes.
2013-02-08 20:58:36 +01:00
Martijn van Groningen 1189a2c2c2 Extended mv sorting integration test 2013-02-08 15:24:56 +01:00
Martijn van Groningen 8c7779057c Added sort by field that have multiple values per document.
Closes #2634
2013-02-08 13:28:40 +01:00
Simon Willnauer 033d6e4306 don't use substraction for comparison if datatypes can overflow 2013-02-08 10:07:31 +01:00
Martijn van Groningen f97021b165 Fixes size assertion failure. 2013-02-07 16:50:54 +01:00
Martijn van Groningen e2cb7edb08 Added more info to assert 2013-02-07 13:52:25 +01:00
Martijn van Groningen e72e323c8a Attempt to fix "No active shards" failure 2013-02-07 10:14:10 +01:00
Lee Hinman ed43ad07d7 Throw a more meaningful message when no document is specified for indexing 2013-02-06 22:33:02 +01:00
Florian Schilling a52e01f3e5 Remove XTermsFilter and UidFilter in favour of Lucene 4.1 TermsFilter 2013-02-06 18:45:05 +01:00
Igor Motov 6890c9fa62 Move action.wait_on_mapping_change setting to pom 2013-02-06 11:48:58 -05:00
Igor Motov ed09ba0a18 Improve stability of RecoveryPercolatorTests
Without "action.wait_on_mapping_change" setting set to true, the test node might get shutdown before updated mapping is saved.
2013-02-05 14:53:46 -05:00
Igor Motov 8277833f8d Fix settings processing in WordDelimiterTokenFilterFactory 2013-02-05 10:03:00 -05:00
Martijn van Groningen 19295280d9 Made sure that wrapped child query / parent query gets rewritten only once. 2013-02-05 10:27:31 +01:00
Igor Motov 9e89323ad2 Add proper cleanup to InternalSettingsPerparerTests 2013-02-04 19:58:40 -05:00
Martijn van Groningen bc667c378e Made SoftWrapper fields final. 2013-02-04 14:47:36 +01:00
Martijn van Groningen 8109d13733 Use CacheRecycler when resolving parent docs in TopChildrenQuery. 2013-02-04 12:46:30 +01:00
Martijn van Groningen 9c3a86875b Removed `execution_type` for has_child and has_parent. 2013-02-04 11:37:40 +01:00
Igor Motov 20ce01bd53 Add additional query validation to the terms query parser
Fixes #2608
2013-02-03 09:44:16 -05:00
Shay Banon ebc0c8cc6d when we fix maxMergeAtOnce, make sure to not set it to 1 as its an illegal value 2013-02-01 19:00:01 +01:00
Shay Banon a8c9e580ed add getMaxOrd, and properly document the difference between it and numOrds 2013-02-01 16:13:13 +01:00
Shay Banon 6f1932ab67 support yaml detection on char sequence 2013-02-01 12:46:19 +01:00
Simon Willnauer 6468c15446 check for == 0 rather than > 0 2013-02-01 11:11:47 +01:00
Simon Willnauer c18ae4a194 fix getMemorySizeInBytes in SparseMultiArrayOrdinals 2013-02-01 11:09:09 +01:00
Igor Motov 45b2bff8da Improve SearchStatsTests
Added refresh to guarantee that at least something will be fetched on a fast computer.
2013-01-31 21:19:08 -05:00
Igor Motov ca635deb36 Allow health to be executed on a local node instead of the master 2013-01-31 21:19:08 -05:00
Igor Motov 3c9541dd14 Make facet and sort tests more reliable in case of multiple nodes and shards
Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.
2013-01-31 21:19:07 -05:00
Igor Motov 6a01e7882c Improve shardsCleanup test
When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.
2013-01-31 21:18:14 -05:00
Igor Motov e32efba3d8 Improve RecoverAfterNodes tests 2013-01-31 20:05:55 -05:00
Martijn van Groningen 5e811e5382 Another small TopChildrenQuery cleanup. 2013-01-31 23:49:32 +01:00
Martijn van Groningen 7ef65688cd - TopChildrenQuery cleanup.
- Added class level jdocs for TopChildrenQuery and ChildrenQuery.
2013-01-31 23:38:09 +01:00
Simon Willnauer 1a1df06411 Move OrdsBuilding into a dedicated class and abstract integer pools used to build sparse ordinals 2013-01-31 19:02:31 +01:00
Martijn van Groningen 1f50b07406 Initial parent/child queries cleanup. 2013-01-31 18:39:31 +01:00
Martijn van Groningen 371b071fb7 Added notion of Rewrite that replaces ScopePhase 2013-01-31 17:24:46 +01:00
Martijn van Groningen d4ef4697d5 Also remove scope from facet builders. Fixes build. 2013-01-31 16:34:45 +01:00
Martijn van Groningen 46dd42920c Remove scope support in query and facet dsl.
Remove support for the `scope` field in facets and `_scope` field in the nested and parent/child queries. The scope support for nested queries will be replaced by the `nested` facet option and a facet filter with a nested filter. The nested filters will now support the a `join` option. Which controls whether to perform the block join. By default this enabled, but when disabled it returns the nested documents as hits instead of the joined root document.

Search request with the current scope support.
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
    "query" : {
		"nested" : {
			"path" : "offers",
			"query" : {
				"match" : {
					"offers.color" : "blue"
				}
			},
			"_scope" : "my_scope"
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "offers.size"
			},
			"scope" : "my_scope"
		}
	}
}'
```

The following will be functional equivalent of using the scope support:
```
curl -s -XPOST 'localhost:9200/products/_search?search_type=count' -d '{
    "query" : {
		"nested" : {
			"path" : "offers",
			"query" : {
				"match" : {
					"offers.color" : "blue"
				}
			}
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "offers.size"
			},
			"facet_filter" : {
				"nested" : {
					"path" : "offers",
					"query" : {
						"match" : {
							"offers.color" : "blue"
						}
					},
					"join" : false
				}
			},
			"nested" : "offers"
		}
	}
}'
```

The scope support for parent/child queries will be replaced by running the child query as filter in a global facet.

Search request with the current scope support:
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
	"query" : {
		"has_child" : {
			"type" : "offer",
			"query" : {
				"match" : {
					"color" : "blue"
				}
			},
			"_scope" : "my_scope"
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "size"
			},
			"scope" : "my_scope"
		}
	}
}'
```

The following is the functional equivalent of using the scope support with parent/child queries:
```
curl -s -XPOST 'localhost:9200/products/_search' -d '{
	"query" : {
		"has_child" : {
			"type" : "offer",
			"query" : {
				"match" : {
					"color" : "blue"
				}
			}
		}
	},
	"facets" : {
		"size" : {
			"terms" : {
				"field" : "size"
			},
			"global" : true,
			"facet_filter" : {
				"term" : {
					"color" : "blue"
				}
			}
		}
	}
}'
```

Closes #2606
2013-01-31 15:09:57 +01:00
Martijn van Groningen 355381962b Use only the 'test' index, instead of all indices for child search benchmark. 2013-01-31 13:12:33 +01:00
Shay Banon 6cec73c201 remove fuzzy factor from mapping (internally implemented)
we want to support ~ notion in query parser for types other than strings, we are getting there, one can do now age:10~5, we would love to support it for dates, as in timestamp:2012-10-10~5d, but that requires changes in the query parser to support strings after the ~ sign
2013-01-31 12:23:03 +01:00
Igor Motov 8df7f2af0d Improve testReusePeerRecovery test 2013-01-30 19:51:41 -05:00
Igor Motov 29f4274213 Add index cleanup if index creation fails
Fixes #2590
2013-01-30 10:40:01 -05:00
Shay Banon 5c40c97e6e Id Cache: Allow to configure if ids should be reused (memory wise) or not, default to false
closes #2605
2013-01-30 14:42:07 +01:00
Martijn van Groningen bc20f068c9 Made `search_analyzer` updateable via put mapping api.
Closes #2604
2013-01-30 11:49:20 +01:00
Martijn van Groningen e074e00f76 Fielddata: Moved the growing logic to IntArrayRef 2013-01-30 11:20:41 +01:00
Martijn van Groningen f7692aeef2 Fielddata: IntArrayRef is initialized with small array and grows if needed 2013-01-30 10:57:52 +01:00
Simon Willnauer 5df37eaf75 add more advanced tests for phrase_prefix 2013-01-30 10:51:05 +01:00
Shay Banon f5e55b7cb9 properly print JVM version 2013-01-29 20:25:13 +01:00
Shay Banon 0568284147 reduce the memory needed while building the sparse array ordinals 2013-01-29 20:23:54 +01:00
Shay Banon 716f2aebbb add 0.20.5 2013-01-29 10:14:25 +01:00
Simon Willnauer 0697e2f23e use index prefix in tests to prevent misconfiguration 2013-01-28 15:51:06 +01:00
Simon Willnauer 72a2416a8c Support MultiPhrasePrefixQuery and MultiPhraseQuery in highlighters
Closes #2596
2013-01-28 15:41:25 +01:00
Martijn van Groningen 2e68207d6d Updated suggest api.
# Suggest feature
The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available from version `0.21.0`.

# Fuzzy suggester
The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request.

# Suggest API
The suggest request part is defined along side the query part as top field in the json request.

```
curl -s -XPOST 'localhost:9200/_search' -d '{
  "query" : {
    ...
  },
  "suggest" : {
    ...
  }
}'
```

Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. Both `my-suggest-1` and `my-suggest-2` suggestions use the `fuzzy` suggester, but have a different `text`.

```
"suggest" : {
  "my-suggest-1" : {
    "text" : "the amsterdma meetpu",
    "fuzzy" : {
      "field" : "body"
    }
  },
  "my-suggest-2" : {
    "text" : "the rottredam meetpu",
    "fuzzy" : {
      "field" : "title",
    }
  }
}
```

The below suggest response example includes the suggestion response for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains entries. Each entry is effectively a token from the suggest text and contains the suggestion entry text, the original start offset and length in the suggest text and if found an arbitary number of options.

```
{
  ...
  "suggest": {
    "my-suggest-1": [
      {
        "text" : "amsterdma",
        "offset": 4,
        "length": 9,
        "options": [
           ...
        ]
      },
      ...
    ],
    "my-suggest-2" : [
      ...
    ]
  }
  ...
}
```

Each options array contains a option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance.

```
"options": [
  {
    "text": "amsterdam",
    "freq": 77,
    "score": 0.8888889
  },
  ...
]
```

# Global suggest text

To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is defined globally and applies to the `my-suggest-1` and `my-suggest-2` suggestions.

```
"suggest" : {
  "text" : "the amsterdma meetpu"
  "my-suggest-1" : {
    "fuzzy" : {
      "field" : "title"
    }
  },
  "my-suggest-2" : {
    "fuzzy" : {
      "field" : "body"
    }
  }
}
```

The suggest text can in the above example also be specied as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.

# Other suggest example.

In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase.

```
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
    "my-title-suggestions-1" : {
      "text" : "devloping distibutd saerch engies",
      "fuzzy" : {
        "size" : 3,
        "field" : "title"
      }
    }
  }
}'
```

The above request could yield the response as stated in the code example below. As you can see if we take the first suggested options of each suggestion entry we get `developing distributed search engines` as result.

```
{
  ...
  "suggest": {
    "my-title-suggestions-1": [
      {
        "text": "devloping",
        "offset": 0,
        "length": 9,
        "options": [
          {
            "text": "developing",
            "freq": 77,
            "score": 0.8888889
          },
          {
            "text": "deloping",
            "freq": 1,
            "score": 0.875
          },
          {
            "text": "deploying",
            "freq": 2,
            "score": 0.7777778
          }
        ]
      },
      {
        "text": "distibutd",
        "offset": 10,
        "length": 9,
        "options": [
          {
            "text": "distributed",
            "freq": 217,
            "score": 0.7777778
          },
          {
            "text": "disributed",
            "freq": 1,
            "score": 0.7777778
          },
          {
            "text": "distribute",
            "freq": 1,
            "score": 0.7777778
          }
        ]
      },
      {
        "text": "saerch",
        "offset": 20,
        "length": 6,
        "options": [
          {
            "text": "search",
            "freq": 1038,
            "score": 0.8333333
          },
          {
            "text": "smerch",
            "freq": 3,
            "score": 0.8333333
          },
          {
            "text": "serch",
            "freq": 2,
            "score": 0.8
          }
        ]
      },
      {
        "text": "engies",
        "offset": 27,
        "length": 6,
        "options": [
          {
            "text": "engines",
            "freq": 568,
            "score": 0.8333333
          },
          {
            "text": "engles",
            "freq": 3,
            "score": 0.8333333
          },
          {
            "text": "eggies",
            "freq": 1,
            "score": 0.8333333
          }
        ]
      }
    ]
  }
  ...
}
```

# Common suggest options:
* `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.

# Common fuzzy suggest options
* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value:
** `score` - Sort by sore first, then document frequency and then the term itself.
** `frequency` - Sort by document frequency first, then simlarity score and then the term itself.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
** `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
** `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
** `always` - Suggest any matching suggestions based on terms in the suggest text.

# Other fuzzy suggest options:
* `lowercase_terms` - Lower cases the suggest text terms after text analyzation.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` -  The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance.  The shard level document frequencies are used for this option.
2013-01-28 15:18:18 +01:00