When we decided to deprecate and remove fuzzy query in #15760, we didn't realize we would take away the possibililty for uses to use a fuzzy query as part of a span query, which is not possible using match query. This means we have to go back and un-deprecate fuzzy query, which will not be removed.
Closes#15760
Currently, the `terms` query is just syctactic sugar for a `bool` query when
used in a query context. This change proposes to always generate the same query
in query and filter contexts, which is less confusing.
These query names were all deprecated in 5.0.0:
- in is removed in favour of terms
- geo_bbox is removed in favour of geo_bounding_box
- mlt is removed in favour of more_like_this
- fuzzy_match and match_fuzzy are removed in favour of match
Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license
Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true
Adapt the expectations of some tests to the new format of the Lucene explain output
* master: (22 commits)
Add proper toString() method to UpdateTask (#21582)
Fix `InternalEngine#isThrottled` to not always return `false`. (#21592)
add `ignore_missing` option to SplitProcessor (#20982)
fix trace_match behavior for when there is only one grok pattern (#21413)
Remove dead code from GetResponse.java
Fixes date range query using epoch with timezone (#21542)
Do not cache term queries. (#21566)
Updated dynamic mapper section
Docs: Clarify date_histogram bucket sizes for DST time zones
Handle release of 5.0.1
Fix skip reason for stats API parameters test
Reduce skip version for stats API parameter tests
Strict level parsing for indices stats
Remove cluster update task when task times out (#21578)
[DOCS] Mention "all-fields" mode doesn't search across nested documents
InternalTestCluster: when restarting a node we should validate the cluster is formed via the node we just restarted
Fixed bad asciidoc in boolean mapping docs
Fixed bad asciidoc ID in node stats
Be strict when parsing values searching for booleans (#21555)
Fix time zone rounding edge case for DST overlaps
...
* master: (516 commits)
Avoid angering Log4j in TransportNodesActionTests
Add trace logging when aquiring and releasing operation locks for replication requests
Fix handler name on message not fully read
Remove accidental import.
Improve log message in TransportNodesAction
Clean up of Script.
Update Joda Time to version 2.9.5 (#21468)
Remove unused ClusterService dependency from SearchPhaseController (#21421)
Remove max_local_storage_nodes from elasticsearch.yml (#21467)
Wait for all reindex subtasks before rethrottling
Correcting a typo-Maan to Man-in README.textile (#21466)
Fix InternalSearchHit#hasSource to return the proper boolean value (#21441)
Replace all index date-math examples with the URI encoded form
Fix typos (#21456)
Adapt ES_JVM_OPTIONS packaging test to ubuntu-1204
Add null check in InternalSearchHit#sourceRef to prevent NPE (#21431)
Add VirtualBox version check (#21370)
Export ES_JVM_OPTIONS for SysV init
Skip reindex rethrottle tests with workers
Make forbidden APIs be quieter about classpath warnings (#21443)
...
This commit introduces a new execution mode for the
`simple_query_string` query, which is intended down the road to be a
replacement for the current _all field.
It now does auto-field-expansion and auto-leniency when the following criteria
are ALL met:
The _all field is disabled
No default_field has been set in the index settings
No fields are specified in the request
Additionally, a user can force the "all-like" execution by setting the
all_fields parameter to true.
When executing in all field mode, the `simple_query_string` query will
look at all the fields in the mapping that are not metafields and can be
searched, and automatically expand the list of fields that are going to
be queried.
Relates to #20925, which is the `query_string` version of this work.
This is basically the same behavior, but for the `simple_query_string`
query.
Relates to #19784
This commit introduces a new execution mode for the query_string query, which
is intended down the road to be a replacement for the current _all field.
It now does auto-field-expansion and auto-leniency when the following criteria
are ALL met:
The _all field is disabled
No default_field has been set in the index settings
No default_field has been set in the request
No fields are specified in the request
Additionally, a user can force the "all-like" execution by setting the
all_fields parameter to true.
When executing in all field mode, the query_string query will look at all the
fields in the mapping that are not metafields and can be searched, and
automatically expand the list of fields that are going to be queried.
Relates to #19784
This query is deprecated from 5.0 on. Similar to IndicesQueryBuilder we should
log a deprecation warning whenever this query is used.
Relates to #15760
Lucene 6.2 introduces the new `Analyzer.normalize` API, which allows to apply
only character-level normalization such as lowercasing or accent folding, which
is exactly what is needed to process queries that operate on partial terms such
as `prefix`, `wildcard` or `fuzzy` queries. As a consequence, the
`lowercase_expanded_terms` option is not necessary anymore. Furthermore, the
`locale` option was only needed in order to know how to perform the lowercasing,
so this one can be removed as well.
Closes#9978
This change adds an option called `split_on_whitespace` which prevents the query parser to split free text part on whitespace prior to analysis. Instead the queryparser would parse around only real 'operators'. Default to true.
For instance the query `"foo bar"` would let the analyzer of the targeted field decide how the tokens should be splitted.
Some options are missing in this change but I'd like to add them in a follow up PR in order to be able to simplify the backport in 5.x. The missing options (changes) are:
* A `type` option which similarly to the `multi_match` query defines how the free text should be parsed when multi fields are defined.
* Simple range query with additional tokens like ">100 50" are broken when `split_on_whitespace` is set to false. It should be possible to preserve this syntax and make the parser aware of this special syntax even when `split_on_whitespace` is set to false.
* Since all this options would make the `query_string_query` very similar to a match (multi_match) query we should be able to share the code that produce the final Lucene query.
Refactored ScriptType to clean up some of the variable and method names. Added more documentation. Deprecated the 'in' ParseField in favor of 'stored' to match the indexed scripts being replaced by stored scripts.
Hi all,
I was trying to run the percolate examples, but I figured that because of the "type":"keyword" , the code wasn't working.
In the saerch query the "message" : "A new bonsai tree in the office" is a pure string.
I changed it to "text".
- Using log() to indicate natural log can add some confusion when trying to further adjust/tweak scores. Other parts of the API (field_value_factor on this same page) use 'ln' and 'log', so this change should be more consistent
- Fixes#20027
- I generated the images using http://latex2png.com/ at a resolution of 150 which seemed to be about the same size as before
Deprecates the optimize_bbox parameter on geodistance queries. This has no longer been needed since version 2.2 because lucene geo distance queries (postings and LatLonPoint) already optimize by bounding box.
Adds `warnings` syntax to the yaml test that allows you to expect
a `Warning` header that looks like:
```
- do:
warnings:
- '[index] is deprecated'
- quotes are not required because yaml
- but this argument is always a list, never a single string
- no matter how many warnings you expect
get:
index: test
type: test
id: 1
```
These are accessible from the docs with:
```
// TEST[warning:some warning]
```
This should help to force you to update the docs if you deprecate
something. You *must* add the warnings marker to the docs or the build
will fail. While you are there you *should* update the docs to add
deprecation warnings visible in the rendered results.
* [TEST] wait for yellow after setup doc tests
We have many places in the doc where we expect and index to be
yellow before we execute a query. Therefore we have to
always wait for yellow after setup.
Before the query extraction would have been aborted and the percolator query would be marked as unknown.
This resulted in a situation that these queries always need to be evaluated by the memory index at search time.
By adding support for this query many more percolator query candidate hits can skip the expensive memory index verification step. For example the `match` query parser returns a MatchNoDocsQuery if the query terms are removed by text analysis (lets query text only contained stop words).
Before 5.0 for it was required that the percolator queries were cached in jvm heap as Lucene queries for two reasons:
1) Performance. The percolator evaluated all percolator queries all the time. There was no pre-selecting queries that are likely to match like we have today.
2) Updates made to percolator queries were visible in realtime, Today these changes are visible in near realtime. So updating no longer requires the percolator to have the queries in jvm heap.
So having the percolator queries in jvm heap via the percolator cache is now less attractive. Especially when there are many percolator queries then these queries can consume many GBs of jvm heap.
Removing the percolator cache does make the percolate query slower compared to how the execution time in 5.0.0-alpha1 and alpha2, but it is still faster compared to 2.x and before.
This change does the following:
- Queries that are currently unsupported such as prefix queries on numeric
fields or term queries on geo fields now throw an error rather than returning
a query that does not match anything.
- Fuzzy queries on numeric, date and ip fields are now unsupported: they used
to create range queries, we now expect users to use range queries directly.
Fuzzy, regexp and prefix queries are now only supported on text/keyword
fields (including `_all`).
- The `_uid` and `_id` fields do not support prefix or range queries anymore as
it would prevent us to store them more efficiently in the future, eg. by
using a binary encoding.
Note that it is still possible to ignore these errors by using the `lenient`
option of the `match` or `query_string` queries.
Currently `fuzziness` is not supported for the `cross_fields` type
of the `multi_match` query since it complicates the logic that
blends the term queries that cross_fields uses internally. At the
moment using this combination is silently ignored, which can lead to
confusions. Instead we should throw an exception in this case.
The same is true for phrase and phrase_prefix type.
Closes#7764
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.
By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.
Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.
See docs/README.asciidoc for lots more.
Closes#12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
* Added an extra `field` parameter to the `percolator` query to indicate what percolator field should be used. This must be an existing field in the mapping of type `percolator`.
* The `.percolator` type is now forbidden. (just like any type that starts with a `.`)
This only applies for new indices created on 5.0 and later. Indices created on previous versions the .percolator type is still allowed to exist.
The new `percolator` field type isn't active in such indices and the `PercolatorQueryCache` knows how to load queries from these legacy indices.
The `PercolatorQueryBuilder` will not enforce that the `field` parameter is of type `percolator`.
The change adds a new option to the geo_* queries: ignore_unmapped. If this option is set to false, the toQuery method on the QueryBuilder will throw an exception if the field specified in the query is unmapped. If the option is set to true, the toQuery method on the QueryBuilder will return a MatchNoDocsQuery. The default value is false so the queries work how they do today (throwing an exception on unmapped field)
The change adds a new option to the `nested`, `has_parent`, `has_children` and `parent_id` queries: `ignore_unmapped`. If this option is set to false, the `toQuery` method on the QueryBuilder will throw an exception if the type/path specified in the query is unmapped. If the option is set to true, the `toQuery` method on the QueryBuilder will return a MatchNoDocsQuery. The default value is `false`so the queries work how they do today (throwing an exception on unmapped paths/types)
Now the `match` query has been split out into `match`, `match_phrase` and `match_phrase_prefix` we need to update the docs to remove the deprecated syntax
* [TEST] check registered queries one by one in SearchModuleTests
* Switch to using ParseField to parse query names
If we have a deprecated query name, at the moment we don't have a way to log any deprecation warning nor fail when we are in strict mode. With this change we use ParseField, which will take care of the camel casing that we currently do manually (so that one day we can remove it more easily). This also means, that each query will have a unique preferred name, and all the other names are deprecated.
Terms query "in" synonym is now formally deprecated, as well as fuzzy_match, match_fuzzy, match_phrase and match_phrase_prefix for match query, mlt for more_like_this and geo_bbox for geo_bounding_box. All these will be removed in 6.0.
Every QueryParser holds now a ParseField constant called QUERY_NAME_FIELD that holds the name for it. The first name is the preferred one, all the others are deprecated. The first name is taken from the NAME constant already present in each query builder object, so that we somehow keep the serialization constant separated from ParseField. This change also allowed us to remove the names method from the QueryParser interface.
Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache.
The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext
in a binary doc values field. This make sure that the query rewrite happens only during
indexing (some queries for example fetch shapes, terms in remote indices) and
the speed up the loading of the queries in the percolator query cache.
Because the percolator now works inside the search infrastructure a number of features
(sorting fields, pagination, fetch features) are available out of the box.
The following feature requests are automatically implemented via this refactoring:
Closes#10741Closes#7297Closes#13176Closes#13978Closes#11264Closes#10741Closes#4317
This commit updates the documentation for GeoPointField by removing all references to the coerce and doc_values parameters. DocValues are enabled in lucene GeoPointField by default (required for boundary filtering). The QueryBuilders are updated to automatically normalize points (ignoring the coerce parameter) for any index created onOrAfter version 2.2.
With this commit we deprecate the widely misunderstood
fuzzy query but will still allow the fuzziness
parameter in match queries and suggesters.
Relates to #15760
* Remove remaining 1.x bwc logic.
* Stop storing stored fields and indexed terms. The _parent field's only purpose is to support joins between parent and child type and only storing doc values is sufficient.
* In the mapping the parent field mapper is now known under '{parent}#{child}' key, because this is the field the parent/child join uses too.
* Added new sub fetch phase to lookup that _parent field from doc values field if that is required (before this was fetched from stored _parent field)
* Removed the ability to query directly on `_parent` in the query dsl. Instead the `{parent}#{child}` field should be used. Under the hood a doc values query is used instead of a term query, because only doc values fields are stored now.
* Added a new `parent_id` query to easily query child documents with a specific parent id without having to know what join field to use
* Also in aggregations `_parent` field can't be used any more and `{parent}#{child}` field name should be used instead to aggregate directly on the _parent join field.
This commit adds the following:
* SpatialStrategy documentation to the geo-shape reference docs.
* Updates relation documentation to geo-shape-query reference docs.
* Updates GeoShapeFiledMapper to set points_only to true if TERM strategy is used (to be consistent with documentation)
As a replacement use ExistsQueryBuilder inside a mustNot() clause.
So instead of using `new ExistsQueryBuilder(name)` now use:
`new BoolQueryBuilder().mustNot(new ExistsQueryBuilder(name))`.
Closes#14112
In the example we show an `exists` query inside a constant score query. While this is possible, it can mislead users to think it is necessary so we should remove it.
At the time of geo_shape query conception, CONTAINS was not yet a supported spatial operation in Lucene. Since it is now available this commit adds ShapeRelation.CONTAINS to GeoShapeQuery. Randomized testing is included and documentation is updated.
We already introduced the MatchNoneQueryBuilder query that does not
return any documents, mainly because we needed it for internal
representation of the NONE option in the IndicesQueryBuilder.
However, the query was requested at least once also for the query dsl,
and since we can parser it already we should document it as
`match_none` query in the relevant reference docs as well.
When passing the example json snippets through the query parser while working
on #14249 some of the examples could not be parsed. This PR fixes those
examples.
Relates to #14249
The NotQueryBuilder has been deprecated on the 2.x branches
and can be removed with the next major version. It can be
replaced by boolean query with added mustNot() clause.
Closes#13761
* Dropped ScoreType in favour of Lucene's ScoreMode
* Removed `score_type` option from `has_child` and `has_parent` queries in favour for the already existing `score_mode` option.
* Removed the score mode `sum` in favour for the already existing `total` score mode. (`sum` doesn't exist in Lucene's ScoreMode class)
* If `max_children` is set to `0` it now really means that zero children are allowed to match.
This commit splits HasParentQueryParser into toQuery and fromXContent.
This change also deprecates several keys in favor of simplified settings
and adds basic unittests for HasParentQueryParser.
Relates to #10217
The `multi_match` query groups terms that have the same analyzer together and
then applies the boost of the first query in each group. This is not necessary
given that boosts for each term are already applied another way.
Fixed documentation since the default rewrite method for fuzzy queries is to
select top terms, fixed usage of the fuzzy rewrite method, and removed unused
`rewrite` parameter.
Close#6932
This rewrite method is interesting because it computes scores as if all terms
had the same frequencies, which avoids disappointments with ranking when a fuzzy
query ranks typos first given that they are less frequent than the correct term.
This changes the parameter name `ignore_like` to the more user friendly name
`unlike`. This later feature generates a query from the terms in `A` but not
from the terms in `B`. This translates to a result set which is like `A` but
unlike `B`. We could have further negatively boosted any documents that have
some `B`, but these documents already do not receive any contribution from
having `B`, and would therefore negatively compete with documents having `A`.
Closes#11117
* Cut the `has_child` and `has_parent` queries over to use Lucene's query time global ordinal join. The main benefit of this change is that parent/child queries can now efficiently execute if parent/child queries are wrapped in a bigger boolean query. If the rest of the query only hit a few documents both has_child and has_parent queries don't need to evaluate all parent or child documents any more.
* Cut the `_parent` field over to use doc values. This significantly reduces the on heap memory footprint of parent/child, because the parent id values are never loaded into memory.
Breaking changes:
* The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet, so this means that an existing type/mapping can't become a parent type any longer.
* The `has_child` and `has_parent` queries can no longer be use in alias filters.
All these changes, improvements and breaks in compatibility only apply for indices created with ES version 2.0 or higher. For indices creates with ES <= 2.0 the older implementation is used.
It is highly recommended to re-index all your indices with parent and child documents to benefit from all the improvements that come with this refactoring. The easiest way to achieve this is by using the scan and bulk apis using a simple script.
Closes#6107Closes#8134
This change unifies the way scripts and templates are specified for all instances in the codebase. It builds on the Script class added previously and adds request building and parsing support as well as the ability to transfer script objects between nodes. It also adds a Template class which aims to provide the same functionality for template APIs
Closes#11091
These clauses filter the document space without affecting scoring and map to
Lucene's BooleanClause.Occur.FILTER. The `filtered` query is now deprecated and
```json
{
"filtered": {
"query": { //query },
"filter": { //filter }
}
}
```
should be replaced with
```json
{
"bool": {
"must": { //query },
"filter": { //filter }
}
}
```
Meta fields were locked down to not allow exotic options to the
underlying field types in #8143. This change fixes the docs
to no longer refer to the old settings.
closes#10879
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:
- custom cache keys (`_cache_key`) are unsupported
- decisions are made internally and can't be overridden by users ('_cache`)
- not only filters can be cached but also all queries that do not need scores
- parent/child queries can now be cached, however cached entries are only
valid for the current top-level reader so in practice it will likely only
be used on read-only indices
- the cache deduplicates filters, which plays nicer with large keys (eg. `terms`)
- better stats: we already had ram usage and evictions, but now also hit count,
miss count, lookup count, number of cached doc id sets and current number of
doc id sets in the cache
- dynamically changing the filter cache size is not supported anymore
Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).
Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.
Expose new span queries from https://issues.apache.org/jira/browse/LUCENE-6083
Within returns matches from 'little' that are enclosed inside of a match from 'big'.
Containing returns matches from 'big' that enclose matches from 'little'.
field_value_factor now takes a default that is used if the document doesn't
have a value for that field. It looks like:
"field_value_factor": {
"field": "popularity",
"missing": 1
}
Closes#10841
Hi there. I've been experimenting with the search templates recently and I'm a bit confused. Shouldn't the Mustache tags be written like `{{tagname}}` instead of `{tagname}`? Your using `{{...}}` [here](http://www.elastic.co/guide/en/elasticsearch/reference/current/search-template.html) BTW.
Using the first example in that page seems to indicate that something's wrong, or am I missing something?
```
$ curl 'localhost:9200/test/_search' -d '{"query":{"template":{"query":{"match":{"text":"{keywords}"}},"params":{"keywords":"value1_foo"}}}}'
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
$ curl 'localhost:9200/test/_search' -d '{"query":{"template":{"query":{"match":{"text":"{{keywords}}"}},"params":{"keywords":"value1_foo"}}}}'
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"test","_type":"testtype","_id":"1","_score":1.0,"_source":{"text":"value1_foo"}}]}}
```
In Lucene 5.1 lots of filters got deprecated in favour of equivalent queries.
Additionally, random-access to filters is now replaced with approximations on
scorers. This commit
- replaces the deprecated NumericRangeFilter, PrefixFilter, TermFilter and
TermsFilter with NumericRangeQuery, PrefixQuery, TermQuery and TermsQuery,
wrapped in a QueryWrapperFilter
- replaces XBooleanFilter, AndFilter and OrFilter with a BooleanQuery in a
QueryWrapperFilter
- removes DocIdSets.isBroken: the new two-phase iteration API will now help
execute slow filters efficiently
- replaces FilterCachingPolicy with QueryCachingPolicy
Close#8960
This adds a new feature to the Term Vectors API which allows for filtering of
terms based on their tf-idf scores. With `dfs` option on, this could be useful
for finding out a good characteric vector of a document or a set of documents.
The parameters are similar to the ones used in the MLT Query.
Closes#9561
This is really a Collector instead of a filter. This commit deprecates the
`limit` filter, makes it a no-op and recommends to use the `terminate_after`
parameter instead that we introduced in the meantime.
Relates to #10154 and #10150
Adds link to additional information on how document frequencies are treated across shards to the cutoff_frequency parameter documentation.
Closes#10451
The analysis chain should be used instead of relying on this, as it is
confusing when dealing with different per-field analysers.
The `locale` option was only used for `lowercase_expanded_terms`, which,
once removed, is no longer needed, so it was removed as well.
Fixes#9978
Relates to #9973
Double negatives are confusing, but a triple negative (1 no, 2 non, 3 null)? It takes five minutes to understand this little sentence. Cleaned that up a bit.
Closes#9789
This is our only cache which is not 'exact' and might allow for stalled results.
Additionally, a similar cache that we have and needs to perform lookups in other
indices in order to run queries is the script index, and for this index we rely
on the filesystem cache, so we should probably do the same with terms filters
lookups.
Close#9056
Up to now, all filters could be cached using the `_cache` flag that could be
set to `true` or `false` and the default was set depending on the type of the
`filter`. For instance, `script` filters are not cached by default while
`terms` are. For some filters, the default is more complicated and eg. date
range filters are cached unless they use `now` in a non-rounded fashion.
This commit adds a 3rd option called `auto`, which becomes the default for
all filters. So for all filters a cache wrapper will be returned, and the
decision will be made at caching time, per-segment. Here is the default logic:
- if there is already a cache entry for this filter in the current segment,
then return the cache entry.
- else if the doc id set cannot iterate (eg. script filter) then do not cache.
- else if the doc id set is already cacheable and it has been used twice or
more in the last 1000 filters then cache it.
- else if the filter is costly (eg. multi-term) and has been used twice or more
in the last 1000 filters then cache it.
- else if the doc id set is not cacheable and it has been used 5 times or more
in the last 1000 filters, then load it into a cacheable set and cache it.
- else return the uncached set.
So for instance geo-distance filters and script filters are going to use this
new default and are not going to be cached because of their iterators.
Similarly, date range filters are going to use this default all the time, but
it is very unlikely that those that use `now` in a not rounded fashion will get
reused so in practice they won't be cached.
`terms`, `range`, ... filters produce cacheable doc id sets with good iterators
so they will be cached as soon as they have been used twice.
Filters that don't produce cacheable doc id sets such as the `term` filter will
need to be used 5 times before being cached. This ensures that we don't spend
CPU iterating over all documents matching such filters unless we have good
evidence of reuse.
One last interesting point about this change is that it also applies to compound
filters. So if you keep on repeating the same `bool` filter with the same
underlying clauses, it will be cached on its own while up to now it used to
never be cached by default.
`_cache: true` has been changed to only cache on large segments, in order to not
pollute the cache since small segments should not be the bottleneck anyway.
However `_cache: false` still has the same semantics.
Close#8449
I replaced "high frequent terms" with "high frequency terms" and "low frequent terms" with "low frequency terms".
Alternatively, we could write, "highly frequent terms" and "minimally frequent terms" (or just "rare terms").
Closes#8962
Adds a `ignore_like` parameter to the MLT Query, which simply tells the
algorithm to skip all the terms from the given documents. This could be useful
in order to better guide nearest neighbor search by telling the algorithm to
never explore the space spanned by the given `ignore_like` docs. In essence we
are interested about the characteristic of a given item, but not of the ones
provided by `ignore_like`, thereby forcing the algorithm to go deeper in its
selection of terms. Note that this is different than simply performing a must
not boolean query on the unliked items. The syntax is exactly the same as the
`like` parameter.
Closes#8674
functon_score matched each document regardless of the computed score.
This commit adds a query parameter `min_score` (-Float.MAX_VALUE default).
Documents that have a score lower than this threshold will not be mached.
closes#6952
This prevents too-difficult regular expressions from consuming
excessive RAM/CPU; the default max_determinized_states is 10,000 (same
as Lucene) but query_string and regepx query/filter can override
per-request.
The also upgrades to a new Lucene 5.0.0 snapshot.
Closes#8386Closes#8357
This has a lot of improvements in lucene, particularly around memory usage, merging, safety, compressed bitsets, etc.
On the elasticsearch side, summary of the larger changes:
API changes: postings API became a "pull" rather than "push", collector API became per-segment, etc.
packaging changes: add lucene-backwards-codecs.jar as a dependency.
improvements to boolean filtering: especially ensuring it will not be slow for SparseBitSet.
use generic BitSet api in plumbing so that concrete bitset type is an implementation detail.
use generic BitDocIdSetFilter api for dedicated bitset cache, so there is type safety.
changes to support atomic commits
implement Accountable.getChildResources (detailed memory usage API) for fielddata, etc
change handling of IndexFormatTooOld/New, since they no longer extends CorruptIndexException
Closes#8347.
Squashed commit of the following:
commit d90d53f5f21b876efc1e09cbd6d63c538a16cd89
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Nov 5 21:35:28 2014 +0100
Make default codec/postings/docvalues format constants
commit cb66c22c71cd304a36e7371b199a8c279908ae37
Merge: d4e2f6d ad4ff43
Author: Robert Muir <rmuir@apache.org>
Date: Wed Nov 5 11:41:13 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit d4e2f6dfe767a5128c9b9ae9e75036378de08f47
Merge: 4e5445c 4111d93
Author: Robert Muir <rmuir@apache.org>
Date: Wed Nov 5 06:26:32 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 4e5445c775f580730eb01360244e9330c0dc3958
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 16:19:19 2014 -0500
FixedBitSet -> BitSet
commit 9887ea73e8b857eeda7f851ef3722ef580c92acf
Merge: 1bf8894 fc84666
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 15:26:25 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 1bf8894430de3e566d0dc5623b0cc28b0d674ebb
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 15:22:51 2014 -0500
remove nocommit
commit a9c2a2259ff79c69bae7806b64e92d5f472c18c8
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:48:43 2014 -0500
turn jenkins red again
commit 067baaaa4d52fce772c81654dcdb5051ea79139f
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:18:21 2014 -0500
unzip from stream
commit 82b6fba33d362aca2313cc0ca495f28f5ebb9260
Merge: b2214bb 6523cd9
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:10:59 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit b2214bb093ec2f759003c488c3c403c8931db914
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 13:09:53 2014 -0500
go back to my URL until we can figure out what is up with jenkins
commit e7d614172240175a51f580aeaefb6460d21cede9
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 10:52:54 2014 -0500
try this jenkins
commit 337a3c7704efa7c9809bf373152d711ee55f876c
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 16:17:49 2014 +0100
Rename temp-files under lock to prevent metadata reads while renaming
commit 77d5ba80d0a76efa549dd753b9f114b2f2d2d29c
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 10:07:11 2014 -0500
continue to treat too-old/too-new as corruption for now
commit 98d0fd2f4851bc50e505a94ca592a694d502c51c
Author: Robert Muir <rmuir@apache.org>
Date: Tue Nov 4 09:24:21 2014 -0500
fix last nocommit
commit 643fceed66c8caf22b97fc489d67b4a2a90a1a1c
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 14:46:17 2014 +0100
remove NoSuchDirectoryException
commit 2e43c4feba05cfaf451df70f946c0930cbcc4557
Merge: 93826e4 8163107
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 14:38:00 2014 +0100
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 93826e4d56a6a97c2074669014af77ff519bde63
Merge: 7f10129 44e24d3
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Nov 4 12:54:27 2014 +0100
Merge branch 'master' into enhancement/lucene_5_0_upgrade
Conflicts:
src/main/java/org/elasticsearch/index/store/DistributorDirectory.java
src/main/java/org/elasticsearch/index/store/Store.java
src/main/java/org/elasticsearch/indices/recovery/RecoveryStatus.java
src/test/java/org/elasticsearch/index/store/DistributorDirectoryTest.java
src/test/java/org/elasticsearch/index/store/StoreTest.java
src/test/java/org/elasticsearch/indices/recovery/RecoveryStatusTests.java
commit 7f10129364623620575c109df725cf54488b3abb
Author: Adrien Grand <jpountz@gmail.com>
Date: Tue Nov 4 11:32:24 2014 +0100
Fix TopHitsAggregator to not ignore the top-level/leaf collector split.
commit 042fadc8603b997bdfdc45ca44fec70dc86774a6
Author: Adrien Grand <jpountz@gmail.com>
Date: Tue Nov 4 11:31:20 2014 +0100
Remove MatchDocIdSet in favor of DocValuesDocIdSet.
commit 7d877581ff5db585a674c95ac391ac78a0282826
Author: Adrien Grand <jpountz@gmail.com>
Date: Tue Nov 4 11:10:08 2014 +0100
Make the and filter use the cost API.
Lucene 5 ensured that cost() can safely be used, and this will have the benefit
that the order in which filters are specified is not important anymore (only
for slow random-access filters in practice).
commit 78f1718aa2cd82184db7c3a8393e6215f43eb4a8
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 23:55:17 2014 -0500
fix previous eclipse import braindamage
commit 186c40e9258ce32f22a9a714ab442a310b6376e0
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 22:32:34 2014 -0500
allow child queries to exhaust iterators again
commit b0b1271305e1b6d0c4c4da51a3c54df1aa5c0605
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 14:50:44 2014 -0800
Fix nocommit for mapping output. index_options will not be printed if
the field is not indexed.
commit ba223eb85e399c9620a347a983e29bf703953e7a
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 14:07:26 2014 -0800
Remove no commit for chinese analyzer provider. We should have a
separate issue to address not using this provider on new indexes.
commit ca554b03c4471797682b2fb724f25205cf040c4a
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 13:41:59 2014 -0800
Fix stop tests
commit de67c4653ec47dee9c671390536110749d2bb05f
Author: Ryan Ernst <ryan@iernst.net>
Date: Mon Nov 3 12:51:17 2014 -0800
Remove analysis nocommits, switching over to Lucene43*Filters for
backcompat
commit 50cae9bec72c25c33a1ab8a8931bccb3355171e2
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 15:32:25 2014 -0500
add ram accounting and TODO lazy-loading (its no worse than master, can be a followup improvement) for suggesters
commit 7a7f0122f138684b312d0f0b03dc2a9c16c15f9c
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 15:11:26 2014 -0500
bump lucene version
commit cd0cae5c35e7a9e049f49ae45431f658fb86676b
Merge: 446bc09 3c72073
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 14:49:05 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 446bc09b4e8bf4602d3c252b53ddaa0da65cce2f
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 14:46:30 2014 -0500
remove hack
commit a19d85a968d82e6d00292b49630ef6ff2dbf2f32
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 12:53:11 2014 -0500
dont create exceptions with circular references on corruption (will open a PR for this)
commit 0beefb9e821d97c37e90ec556d81ac7b00369b8a
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 11:47:14 2014 -0500
temporarily add craptastic detector for this horrible bug
commit e9f2d298bff75f3d1591f8622441e459c3ce7ac3
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 10:56:01 2014 -0500
add nocommit
commit e97f1d50a91a7129650b8effc7a9ecf74ca0569a
Merge: c57a3c8 f1f50ac
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 10:12:12 2014 -0500
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit c57a3c8341ed61dca62eaf77fad6b8b48aeb6940
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 10:11:46 2014 -0500
fix nocommit
commit dd0e77e4ec07c7011ab5f6b60b2ead33dc2333d2
Author: Robert Muir <rmuir@apache.org>
Date: Mon Nov 3 09:54:09 2014 -0500
nocommit -> TODO, this is in much more places in the codebase, bigger issue
commit 3cc3bf56d72d642059f8fe220d6f2fed608363e9
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Nov 1 23:59:17 2014 -0700
Remove nocommit and awaitsfix for edge ngram filter test.
commit 89f115245155511c0fbc0d5ee62e63141c3700c1
Author: Ryan Ernst <ryan@iernst.net>
Date: Sat Nov 1 23:57:44 2014 -0700
Fix EdgeNGramTokenFilter logic for version <= 4.3, and fixed instanceof
checks in corresponding tests to correctly check for reverse filter when
applicable.
commit 112df869cd199e36aab0e1a7a288bb1fdb2ebf1c
Author: Robert Muir <rmuir@apache.org>
Date: Sun Nov 2 00:08:30 2014 -0400
execute geo disjoint query/filter as intersects
commit e5061273cc685f1252e9a3a9ae4877ec9bce7752
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:58:59 2014 -0400
remove chinese analyzer from docs
commit ea1af11b8978fcc551f198e24fe21d52806993ef
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:29:00 2014 -0400
fix ram accounting bug
commit 53c0a42c6aa81aa6bf81d3aa77b95efd513e0f81
Merge: e3bcd3c 6011a18
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:16:29 2014 -0400
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit e3bcd3cc07a4957e12c7b3affc462c31290a9186
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:15:01 2014 -0400
fix url-email back compat (thanks ryan)
commit 91d6b096a96c357755abee167098607223be1aad
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 22:11:26 2014 -0400
bump lucene version
commit d2bb9568df72b37ec7050d25940160b8517394bc
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 20:33:07 2014 -0400
remove nocommit
commit 1d049c471e19e5c457262c7399c5bad9e023b2e3
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 20:28:58 2014 -0400
fix eclipse to group org/com imports together: without this, its madness
commit 09d8c1585ee99b6e63be032732c04ef6fed84ed2
Author: Robert Muir <rmuir@apache.org>
Date: Sat Nov 1 14:27:41 2014 -0400
remove nocommit, if you dont liek it, print assembly and tell me how it can be better
commit 8a6a294313fdf33b50c7126ec20c07867ecd637c
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 20:01:55 2014 +0100
Remove deprecated usage of DocIdSets.newDocIDSet.
commit 601bee60543610558403298124a84b1b3bbd1045
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 14:13:18 2014 -0400
maybe one of these zillions of annotations will stop thread leaks
commit 9d3f69abc7267c5e455aefa26db95cb554b02d62
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 14:05:39 2014 -0400
fix some analysis nocommits
commit 312e3a29c77214b8142d21c33a6b2c2b151acf9a
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 18:28:45 2014 +0100
Remove XConstantScoreQuery/XFilteredQuery/ApplyAcceptedDocsFilter.
commit 5a0cb9f8e167215df7f1b1fad11eec6e6c74940f
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 17:06:45 2014 +0100
Fix misleading documentation of DocIdSets.toCacheable.
commit 8b4ef2b5b476fff4c79c0c2a0e4769ead26cf82b
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 17:05:59 2014 +0100
Fix CustomRandomAccessFilterStrategy to override the right method.
commit d7a9a407a615987cfffc651f724fbd8795c9c671
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 16:21:35 2014 +0100
Better handle the special case when there is a single SHOULD clause.
commit 648ad389f07e92dfc451f345549c9841ba5e4c9a
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 15:53:38 2014 +0100
Cut over XBooleanFilter to BitDocIdSet.Builder.
The idea is similar to what happened to Lucene's BooleanFilter.
Yet XBooleanFilter is a bit more sophisticated and I had to slightly
change the way it is implemented in order to make it work. The main difference
with before is that slow filters are now applied lazily, so eg. if you have 3
MUST clauses, two with a fast iterator and the third with a slow iterator, the
previous implementation used to apply the fast iterators first and then only
check the slow filter for bits which were set in the bit set. Now we are
computing a bit set based on the fast must clauses and then basically returning
a BitsFilteredDocIdSet.wrap(bitset, slowClause).
Other than that, BooleanFilter still uses the bitset optimizations when or-ing
and and-ind filters.
Another improvement is that BooleanFilter is now aware of the cost API.
commit b2dad312b4bc9f931dc3a25415dd81c0d9deee08
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 10:18:53 2014 -0400
clear nocommit
commit 4851d2091e744294336dfade33906c75fbe695cd
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 15:15:16 2014 +0100
cut over to RoaringDocIdSet
commit ca6aec24a901073e65ce4dd6b70964fd3612409e
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:57:30 2014 +0100
make nocommit more explicit
commit d0742ee2cb7a6c48b0bbb31580b7fbcebdb6ec40
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 09:55:24 2014 -0400
fix standardtokenizer nocommit
commit 7d6faccafff22a86af62af0384838391d46695ca
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:54:08 2014 +0100
fix compilation
commit a038a405c1ff6458ad294e6b5bc469e622f699d0
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:53:43 2014 +0100
fix compilation
commit 30c9e307b1f5d80e2deca3392c0298682241207f
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:52:35 2014 +0100
fix compilation
commit e5139bc5a0a9abd2bdc6ba0dfbcb7e3c2e7b8481
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 09:52:16 2014 -0400
clear nocommit here
commit 85dd2cedf7a7994bed871ac421cfda06aaf5c0a5
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:46:17 2014 +0100
fix CompletionPostingsFormatTest
commit c0f3781f616c9b0ee3b5c4d0998810f595868649
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 09:38:00 2014 -0400
add tests for these analyzers
commit 51f9999b4ad079c283ae762c862fd0e22d00445f
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 14:10:26 2014 +0100
remove nocommit - this is not an issue
commit fd1388fa03e622b0738601c8aeb2dbf7949a6dd2
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Fri Oct 31 14:07:01 2014 +0100
Remove redundant null check
commit 3d6dd51b0927337ba941a235446b22e8cd500dc3
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Fri Oct 31 14:01:37 2014 +0100
Removed the work around to prevent p/c error when invoking #iterator() twice, because the custom query filter wrapper now doesn't transform the result to a cache doc id set any more.
I think the transforming to a cachable doc id set in CustomQueryWrappingFilter isn't needed at all, because we use the DocIdSet only once and because of that is just slowed things down.
commit 821832a537e00cd1216064b379df3e01d2911d3a
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:54:33 2014 +0100
one more nocommit
commit 77eb9ea4c4ea50afb2680c29682ddcb3851a9d4f
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Fri Oct 31 13:52:29 2014 +0100
Remove cast
commit a400573c034ed602221f801b20a58a9186a06eae
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:49:24 2014 +0100
fix stop filter
commit 51746087cf8ec34c4d20aa05ba8dbff7b3b43eec
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:21:36 2014 +0100
fix changed semantics of FBS.nextSetBit to check for NO_MORE_DOCS
commit 8d0a4e2511310f1293860823fe3ba80ac771bbe3
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 08:13:44 2014 -0400
do the bogus cast differently
commit 46a5cc5732dea096c0c80ae5ce42911c9c51e44e
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Oct 31 13:00:16 2014 +0100
I hate it but P/C now passes
commit 580c0c2f82bbeacf217e594f22312b11d1bdb839
Merge: a9d3c00 1645434
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 31 06:54:31 2014 -0400
fix nocommit/classcast
commit a9d3c004d62fe04989f49a897e6ff84973c06eb9
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Oct 31 08:49:31 2014 +0100
Update TODO.
commit aa75af0b407792aeef32017f03a6f442ed970baa
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 19:18:25 2014 -0400
clear obselete nocommits from lucene bump
commit d438534cf41fcbe2d88070e2f27c994625e082c2
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 18:53:20 2014 -0400
throw classcastexception when ES abuses regular filtercache for nested docs
commit 2c751f3a8feda43ec127c34769b069de21f3d16f
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 18:31:34 2014 -0400
bump lucene revision, fix tests
commit d6ef7f6304ae262bf6228a7d661b2a452df332be
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 22:37:58 2014 +0100
fix merge problems
commit de9d361f88a9ce6bb3fba85285de41f223c95767
Merge: 41f6aab f6b37a3
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 22:28:59 2014 +0100
Merge branch 'master' into enhancement/lucene_5_0_upgrade
Conflicts:
pom.xml
src/main/java/org/elasticsearch/Version.java
src/main/java/org/elasticsearch/gateway/local/state/meta/MetaDataStateFormat.java
commit 41f6aab388aa80c40b08a2facab2617576203a0d
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 17:48:46 2014 +0100
fix potiential NPE
commit c4428b12e1ae838b91e847df8b4a8be7f49e10f4
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 17:38:46 2014 +0100
don't advance iterator in a match(doc) method
commit 28ab948e99e3ea4497c9b1e468384806ba7e1790
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 17:34:58 2014 +0100
don't advance iterator in a match(doc) method
commit eb0f33f6634fadfcf4b2bf7327400e568f0427bb
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 16:55:54 2014 +0100
fix GeoUtilsTest
commit 7f711fe3eaf73b6c2268cf42d5a41132a61ad831
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 16:43:16 2014 +0100
Use a dedicated default index option if field type is not indexed by default
commit 78e3f37ab779e3e1b25b45a742cc86ab5f975149
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 10:56:14 2014 -0400
disable this test with AwaitsFix to reduce noise
commit 9a590f563c8e03a99ecf0505c92d12d7ab20d11d
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 09:38:49 2014 +0100
fix lucene version
commit abe3ca1d8bb6b5101b545198f59aec44bacfa741
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Oct 30 09:35:05 2014 +0100
fix AnalyzingCompletionLookupProvider to wrok with new codec API
commit 464293b245852d60bde050c6d3feb5907dcfbf5f
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 00:26:00 2014 -0400
don't try to write stuff to tests class directory
commit 031cc6c19f4fe4423a034b515f77e5a0e282a124
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 00:12:36 2014 -0400
AwaitsFix these known issues to reduce noise
commit 4600d51891e35847f2d344247d6f915a0605c0d1
Author: Robert Muir <rmuir@apache.org>
Date: Thu Oct 30 00:06:53 2014 -0400
openbitset lives on
commit 8492bae056249e2555d24acd55f1046b66a667c4
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 23:42:54 2014 -0400
fixes for filter tests
commit 31f24ce4efeda31f97eafdb122346c7047a53bf2
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 23:12:38 2014 -0400
don't use fieldcache
commit 8480789942fdff14a6d2b2cd8134502fe62f20c8
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 23:04:29 2014 -0400
ancient index no longer supported
commit 02e78dc7ebdd827533009f542582e8db44309c57
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 23:37:02 2014 +0100
fix more tests
commit ff746c6df23c50b3f3ec24922413b962c8983080
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 23:08:19 2014 +0100
fix all mapper
commit e4fb84b517107b25cb064c66f83c9aa814a311b2
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 22:55:54 2014 +0100
fix distributor tests and cut over to FileStore API
commit 20c850e2cfe3210cd1fb9e232afed8d4ac045857
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 22:42:18 2014 +0100
use DOCS_ONLY if index=true and current options == null
commit 44169c108418413cfe51f5ce23ab82047463e4c2
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 22:33:36 2014 +0100
Fix index=yes|no settings in mappers
commit a3c5f77987461a18121156ed345d42ded301c566
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 21:51:41 2014 +0100
fix several field mappers conversion from setIndexed to indexOptions
commit df84d736908e88a031d710f98e222be68ae96af1
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 21:33:35 2014 +0100
fix SourceFieldMapper to be not indexed
commit b2bf01d12a8271a31fb2df601162d0e89924c8f5
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 21:23:08 2014 +0100
Cut over to .liv files in store and corruption tests
commit 619004df436f9ef05d24bef1b6a7f084c6b0ad75
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 17:05:52 2014 +0100
fix more tests
commit b7ed653a8b464de446e00456bce0a89e47627c38
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 16:19:08 2014 +0100
[STORE] Add dedicated method to write temporary files
Recovery writes temporary files which might not end up in the
right distributor directories today. This commit adds a dedicated
API that allows specifying the target file name in order to create the
tempoary file in the correct directory.
commit 7d574659f6ae04adc2b857146ad0d8d56ca66f12
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 10:28:49 2014 -0400
add some leniency to temporary bogus method
commit f97022ea7c2259f7a5cf97d924c59ed75ab65b32
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 10:24:17 2014 -0400
fix MultiCollector bug
commit b760533128c2b4eb10ad76e9689ef714293dd819
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:56:08 2014 +0100
CheckIndex is now closeable we need to close it
commit 9dae9fb6d63546a6c2427be2a2d5c8358f5b1934
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:45:11 2014 +0100
s/Lucene51/Lucene50
commit 7aea9b86856a8c1b06a08e7c312ede1168af1287
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:42:30 2014 +0100
fix BloomFilterPostingsFormat
commit 16fea6fe842e88665d59cc091e8224e8dc6ce08c
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:41:16 2014 +0100
fix some codec format issues
commit 3d77aa97dd2c4012b63befef3f2ba2525965e8a6
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:30:43 2014 +0100
fix CodecTests
commit 6ef823b1fde25657438ace1aabd9d552d6ae215e
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:26:47 2014 +0100
make it compile
commit 9991eee1fe99435118d4dd42b297ffc83fce5ec5
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 09:12:43 2014 -0400
add an ugly hack for TopHitsAggregator for now
commit 03e768a01fcae6b1f4cb50bcceec7d42977ac3e6
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Oct 29 14:01:02 2014 +0100
cut over ES090PostingsFormat
commit 463d281faadb794fdde3b469326bdaada25af048
Merge: 0f8740a 8eac79c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 08:30:36 2014 -0400
Merge branch 'master' into enhancement/lucene_5_0_upgrade
commit 0f8740a782455a63524a5a82169f6bbbfc613518
Author: Robert Muir <rmuir@apache.org>
Date: Wed Oct 29 01:00:15 2014 -0400
fix/hack remaining filter and analysis issues
commit df534488569da13b31d66e581456dfd4b55156b9
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 23:11:47 2014 -0400
fix ngrams / openbitset usage
commit 11f5dc3b9887f4da80a0fa1818e1350b30599329
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 22:42:44 2014 -0400
hack over sort comparators
commit 4ebdc754350f512596f6a02770d223e9f5f7975a
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 21:27:07 2014 -0400
compiler errors < 100
commit 2d60c9e29de48ccb0347dd87f7201f47b67b83a0
Author: Robert Muir <rmuir@apache.org>
Date: Tue Oct 28 03:13:08 2014 -0400
clear some nocommits around ram usage
commit aaf47fe6c0aabcfb2581dd456fc50edf871da758
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 12:27:34 2014 -0400
migrate fieldinfo handling
commit ef6ed6d15d8def71cd880d97249678136cd29fe3
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 12:07:13 2014 -0400
more simple fixes
commit f475e1048ae697dd9da5bd9da445102b0b7bc5b3
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 11:58:21 2014 -0400
more fielddata ram accounting fixes
commit 16b4239eaa9b4262df258257df4f31d39f28a3a2
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 16:47:32 2014 +0100
add missing file
commit 5b542fa2a6da81e36a0c35b8e891a1d8bc58f663
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 16:43:29 2014 +0100
cut over completion posting formats - still some nocommits
commit ecdea49404c4ec4e1b78fb54575825f21b4e096e
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 11:21:09 2014 -0400
fielddata accountable fixes
commit d43da265718917e20c8264abd43342069198fe9c
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 16:19:53 2014 +0100
cut over BloomFilterPostings to new API
commit 29b192ba621c14820175775d01242162b88bd364
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 10:22:51 2014 -0400
fix more analyzers
commit 74b4a0c5283e323a7d02490df469497c722780d2
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 09:54:25 2014 -0400
fix tests
commit 554084ccb4779dd6b1c65fa7212ad1f64f3a6968
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:51:48 2014 +0100
maintain supressed exceptions on CorruptIndexException
commit cf882d9112c5e8ef1e9f2b0f800f7aa59001a4f2
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:47:17 2014 +0100
commitOnClose=false
commit ebb2a9189ab2f459b7c6c9985be610fd90dfe410
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:46:06 2014 +0100
cut over indexwriter closeing in InternalEngine
commit cd21b3d4706f0b562bd37792d077d60832aff65f
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Oct 27 14:38:10 2014 +0100
fix constant
commit f93f900c4a1c90af3a21a4af5735a7536423fe28
Author: Robert Muir <rmuir@apache.org>
Date: Mon Oct 27 09:50:49 2014 -0400
fix test
commit a9a752940b1ab4699a6a08ba8b34afca82b843fe
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Mon Oct 27 09:26:18 2014 +0100
Be explicit about the index options
commit d9ee815babd030fa2ceaec9f467c105ee755bf6b
Author: Simon Willnauer <simonw@apache.org>
Date: Sun Oct 26 20:03:44 2014 +0100
cut over store and directory
commit b3f5c8e39039dd8f5caac0c4dd1fc3b1116e64ca
Author: Robert Muir <rmuir@apache.org>
Date: Sun Oct 26 13:08:39 2014 -0400
more test fixes
commit 8842f2684e3606aae0860c27f7a4c53e273d47fb
Author: Robert Muir <rmuir@apache.org>
Date: Sun Oct 26 12:14:52 2014 -0400
tests manual labor
commit c43de5aec337919a3fdc3638406dff17fc80bc98
Author: Robert Muir <rmuir@apache.org>
Date: Sun Oct 26 11:04:13 2014 -0400
BytesRef -> BytesRefBuilder
commit 020c0d087a2f37566a1db390b0e044ebab030138
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 15:53:37 2014 +0100
Moved over to BitSetFilter
commit 48dd1b909e6c52cef733961c9ecebfe4f67109fe
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 15:53:11 2014 +0100
Left over Collector api change in ScanContext
commit 6ec248ef63f262bcda400181b838fd9244752625
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 15:47:40 2014 +0100
Moved indexed() over to indexOptions != null or indexOptions == null
commit 9937aebfd8546ae4bb652cd976b3b43ac5ab7a63
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date: Sun Oct 26 13:26:31 2014 +0100
Fixed many compile errors. Mainly around the breaking Collector api change in 5.0.
commit fec32c4abc0e3309cf34260c8816305a6f820c9e
Author: Robert Muir <rmuir@apache.org>
Date: Sat Oct 25 11:22:17 2014 -0400
more easy fixes
commit dab22531d801800d17a65dc7c9464148ce8ebffd
Author: Robert Muir <rmuir@apache.org>
Date: Sat Oct 25 09:33:41 2014 -0400
more progress
commit 414767e9a955010076b0497cc4f6d0c1850b48d3
Author: Robert Muir <rmuir@apache.org>
Date: Sat Oct 25 06:33:17 2014 -0400
more progress
commit ad9d969fddf139a8830254d3eb36a908ba87cc12
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 24 14:28:01 2014 -0400
current state of fun
commit 464475eecb0be15d7d084135ed16051f76a7e521
Author: Robert Muir <rmuir@apache.org>
Date: Fri Oct 24 11:42:41 2014 -0400
bump to 5.0 snapshot
The MLT field query is simply replaced by a MLT query set to specififc field.
To simplify code maintenance we should deprecate it in 1.4 and remove it in
2.0.
Closes#8238
The MLT query has a lot of parameters. For example, a set of documents is
specified with either `like_text`, `ids` or `docs`, with at least one
parameter required. This commit groups all the document specification
parameters under one called `like`. The syntax is described below and could
easily be extended to allow for new means of specifying document input. The
`like_text`, `ids` and `docs` parameters are deprecated.
As a single piece text:
{
"query": {
"more_like_this": {
"like": "some text here"
}
}
}
As a single item:
{
"query": {
"more_like_this": {
"like": {
"_index": "imdb",
"_type": "movies",
"_id": "88247"
}
}
}
}
Or as a mixture of all:
{
"query": {
"more_like_this": {
"like": [
"Some random text ...",
{
"_index": "imdb",
"_type": "movies",
"_id": "88247"
},
{
"_index": "imdb",
"_type": "movies",
"doc": {
"title": "Document with an artificial title!"
}
}
]
}
}
}
Closes#8039
Query String query now supports a new `time_zone` option based on JODA time zones.
When using a range on date field, the time zone is applied.
```json
{
"query": {
"query_string": {
"text": "date:[2012 TO 2014]",
"timezone": "Europe/Paris"
}
}
}
```
Closes#7880.
When the date format is defined in mapping, you can not use another format when querying using range date query or filter.
For example, this won't work:
```
DELETE /test
PUT /test/t/1
{
"date": "2014-01-01"
}
GET /test/_search
{
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"from": "01/01/2014"
}
}
}
}
}
}
```
It causes:
```
Caused by: org.elasticsearch.ElasticsearchParseException: failed to parse date field [01/01/2014], tried both date format [dateOptionalTime], and timestamp number
```
It could be nice if we can support at query time another date format just like we support `analyzer` at search time on String fields.
Something like:
```
GET /test/_search
{
"query": {
"filtered": {
"filter": {
"range": {
"date": {
"from": "01/01/2014",
"format": "dd/MM/yyyy"
}
}
}
}
}
}
```
Same for queries:
```
GET /test/_search
{
"query": {
"range": {
"date": {
"from": "01/01/2014",
"format": "dd/MM/yyyy"
}
}
}
}
```
Closes#7189.
This adds a `per_field_analyzer` parameter to the Term Vectors API, which
allows to override the default analyzer at the field. If the field already
stores term vectors, then they will be re-generated. Since the MLT Query uses
the Term Vectors API under its hood, this commits also adds the same ability
to the MLT Query, thereby allowing users to fine grain how each field item
should be processed and analyzed.
Closes#7801
Previously, the only way to specify a document not present in the index was to
use `like_text`. This would usually lead to complex queries made of multiple
MLT queries per document field. This commit adds the ability to the MLT query
to directly specify documents not present in the index (artificial documents).
The syntax is similar to the Percolator API or to the Multi Term Vector API.
Closes#7725
The minimum number of optional should clauses of the generated query to match
can now be set using the more extensive minimum should match syntax. This
makes the `percent_terms_to_match` parameter deprecated, and replaced in favor
to a new `minimum_should_match` parameter.
Closes#7898