Commit Graph

142 Commits

Author SHA1 Message Date
Alan Woodward 71b8494181
Upgrade to lucene 8.0.0-snapshot-ff9509a8df (#39444)
Backport of #39350

Contains the following:

* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
2019-02-27 14:36:08 +00:00
Julie Tibshirani c2e9d13ebd
Default include_type_name to false in the yml test harness. (#38058)
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.

Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
2019-02-01 11:44:13 -08:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Adrien Grand e9fcb25a28
Upgrade to lucene-8.0.0-snapshot-83f9835. (#37668)
This snapshot uses a new file format for doc-values which is expected to make
advance/advanceExact perform faster on sparse fields:
https://issues.apache.org/jira/browse/LUCENE-8585
2019-01-22 11:44:29 +01:00
Nick Knize b2aa655f46
Upgrade master to lucene-8.0.0-snapshot-a1c6e642aa (#37091)
Updates the master branch to the latest snapshot of Lucene 8.0.
2019-01-02 20:18:19 -06:00
Alan Woodward c7ac9ef826
Upgrade to lucene snapshot 774e9aefbc (#36637)
Includes LUCENE-8607: improvement to MatchAllDocsQuery
2018-12-14 20:30:07 +00:00
Alan Woodward 9ac7359643
Update lucene to snapshot-7e4555a2fd (#36563)
Includes the following:

* Reversion of doc-values changes in LUCENE-8374; we are interested in seeing if this 
  has an effect on benchmarks for node-stats and index-stats
* More improvements to docvalues updates
2018-12-12 20:18:32 +00:00
Nhat Nguyen 3fb5a12b30 Upgrade to Lucene-8.0.0-snapshot-61e448666d (#36518)
Includes:
- LUCENE-8602: Share TermsEnum if possible while applying DV updates
2018-12-12 06:47:40 +01:00
Nhat Nguyen 2a7edca59f
Upgrade to Lucene-8.0.0-snapshot-ef61b547b1 (#36450)
Includes:

- LUCENE-8598: Improve field updates packed values
- LUCENE-8599: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates
2018-12-10 16:33:49 -05:00
Nhat Nguyen 10feb75eb7
Upgrade to Lucene-8.0.0-snapshot-aaa64d70159 (#36335)
Includes:

LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates

Relates #36286
2018-12-06 20:33:25 -05:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Alan Woodward 73ceaad03a
Update to lucene-8.0.0-snapshot-c78429a554 (#36212)
Includes:

* A fix for a bug in Intervals.or() (https://issues.apache.org/jira/browse/LUCENE-8586)
* The ability to disable offset mangling in WordDelimiterGraphFilter
        (https://issues.apache.org/jira/browse/LUCENE-8509)
* BM25Similarity no longer multiplies scores by k1 + 1
2018-12-05 12:43:56 +00:00
Jim Ferenczi e37a0ef844
Upgrade to lucene-8.0.0-snapshot-67cdd21996 (#35816) 2018-11-22 15:42:59 +01:00
Nick Knize 2591f66a33
upgrade to lucene-8.0.0-snapshot-6d9c714052 (#35428) 2018-11-12 10:48:27 -06:00
Nick Knize a5e1f4d3a2 Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (#35224) 2018-11-06 11:55:23 +01:00
Jim Ferenczi 241c74efb2
upgrade to a new snapshot of Lucene 8 (7d0a7782fa) (#33812) 2018-09-18 18:16:40 +02:00
Alan Woodward 39c3234c2f
Upgrade to latest Lucene snapshot (#33505)
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk
2018-09-10 20:51:55 +01:00
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Nicholas Knize e162127ff3 Upgrade to Lucene-7.5.0-snapshot-13b9e28f9d
The main feature is the inclusion of bkd backed geo_shape with
INTERSECT, DISJOINT, WITHIN bounding box and polygon query support.
2018-08-09 11:15:02 -05:00
Jim Ferenczi 53ff06e621
Upgrade to Lucene-7.5.0-snapshot-608f0277b0 (#32390)
The main highlight is the removal of the reclaim_deletes_weight in the TieredMergePolicy.
The es setting index.merge.policy.reclaim_deletes_weight is deprecated in this commit and the value is ignored. The new merge policy setting setDeletesPctAllowed should be added in a follow up.
2018-07-27 08:28:51 +02:00
Armin Braun ed3b44fb4c
Handle TokenizerFactory TODOs (#32063)
* Don't replace Replace TokenizerFactory with Supplier, this approach was rejected in #32063 
* Remove unused parameter from constructor
2018-07-17 14:14:02 +02:00
Adrien Grand f023e95ae0
Upgrade to Lucene 7.4.0. (#31529)
This moves Elasticsearch from a recent 7.4.0 snapshot to the GA release.
2018-06-22 16:17:17 +02:00
Nhat Nguyen 8453ca638d
Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) 2018-06-15 10:58:21 -04:00
Nhat Nguyen abe61159a8
Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (#31073)
This snapshot includes:
- LUCENE-8341: Record soft deletes in SegmentCommitInfo which will resolve #30851
- LUCENE-8335: Enforce soft-deletes field up-front
2018-06-04 14:18:46 -04:00
Nhat Nguyen 363f1e84ca
Upgrade to Lucene-7.4-snapshot-1cbadda4d3 (#30928)
This snapshot includes LUCENE-8328 which is needed to stabilize CCR builds.
2018-05-29 12:29:52 -04:00
Nhat Nguyen 1918a30237
Upgrade to Lucene-7.4.0-snapshot-cc2ee23050 (#30778)
The new snapshot includes LUCENE-8324 which fixes missing checkpoint
after a fully deletes segment is dropped on flush. This snapshot should
resolves failed tests in the CorruptedFileIT suite.

Closes #30741
Closes #30577
2018-05-22 13:11:48 -04:00
Nhat Nguyen 67d8fc222d
Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726)
This snapshot resolves issues related to ShrinkIndexIT.
2018-05-18 18:21:39 -04:00
Nhat Nguyen 519768b5d3
Upgrade to Lucene-7.4-snapshot-6705632810 (#30519)
This snapshot is to include LUCENE-8298 which allows DocValues updates
to reset a value. This is needed for the Lucene rollback work.
2018-05-10 12:31:45 -04:00
Jim Ferenczi dbd857341f
Upgrade to 7.4.0-snapshot-1ed95c097b (#30357)
Upgrade to lucene-7.4.0-snapshot-1ed95c097b

This version contains:
* An Analyzer for Korean
* An IntervalQuery and IntervalsSource that retrieve minimum intervals of positional queries.
* A new API to retrieve matches (offsets and positions) of a query for a single document.
* Support for soft deletes in the index writer.
* A fixed shingle filter that handles index time synonyms.
* Support for emoji sequence in ICUTokenizer (with an upgrade to icu 61.1)
2018-05-04 11:44:22 +02:00
Alan Woodward dccd43af47
Upgrade to lucene 7.3.0 (#29387) 2018-04-05 10:34:44 +01:00
Adrien Grand 3bdfc8f3fb
Upgrade to lucene-7.3.0-snapshot-98a6b3d. (#29298)
Most notable changes include:
 - this release doesn't have the 7.2.1 version constant so I had to create one
 - spatial4j and jts were upgraded
2018-04-03 09:27:14 +02:00
Jim Ferenczi be012b1326
upgrade to lucene 7.2.1 (#28218) 2018-01-15 16:47:46 +01:00
Adrien Grand 77711508b0
Upgrade to Lucene 7.2.0. (#27910) 2017-12-20 14:17:40 +01:00
Adrien Grand 6323bb0d97
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27619)
This new snapshot mostly brings a change to TopFieldCollector which can now
early terminate collection when trackTotalHits is `false`.

As a follow-up, we should replace our usage of
`EarlyTerminatingSortingCollector` with this new option.
2017-12-04 09:40:08 +01:00
Adrien Grand 996990ad1f
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496)
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
2017-11-28 14:52:42 +01:00
Colin Goodheart-Smithe c1b8140c83
Upgrade to Lucene 7.1 (#27225) 2017-11-02 13:25:33 +00:00
Md. Abdulla-Al-Sun a40c474e10
Added Bengali Analyzer to Elasticsearch with respect to the lucene update(PR#238) 2017-10-05 13:25:05 +02:00
Martijn van Groningen dca787ed8a
upgrade to Lucene 7.1.0 snapshot version 2017-10-05 09:06:56 +02:00
Jason Tedor e0db89bc35 Upgrade to Lucene 7.0.0
This commit upgrades to the GA release of Luence 7!

Relates #26744
2017-09-21 19:19:33 -04:00
Adrien Grand 78681bc9e5 Upgrade to lucene-7.0.0-snapshot-d94a5f0. (#26441) 2017-08-31 09:06:40 +02:00
Adrien Grand f0c1e30544 Upgrade to lucene-7.0.0-snapshot-a128fcb. (#26090) 2017-08-08 13:03:19 +02:00
Adrien Grand 481d5d09b2 Upgrade to lucene-7.0.0-snapshot-00142c9. (#25641)
Lucene 7.0 is feature-frozen now, so there should not be many changes until GA.
2017-07-11 13:58:55 +02:00
Adrien Grand 44e9c0b947 Upgrade to lucene-7.0.0-snapshot-ad2cb77. (#25349)
Most notable changes:
 - better update concurrency: LUCENE-7868
 - TopDocs.totalHits is now a long: LUCENE-7872
 - QueryBuilder does not remove the boolean query around multi-term synonyms:
   LUCENE-7878
 - removal of Fields: LUCENE-7500

For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding
of vInts and vLongs are compatible: you can write and read with any of them as
long as the value can be represented by a positive int.
2017-06-22 12:35:33 +02:00
Adrien Grand 0c117145f6 Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222)
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
2017-06-15 09:52:07 +02:00
Nicholas Knize deb7caf4d3 Upgrade to lucene-7.0.0-snapshot-a0aef2f
This commit upgrades master to a current lucene snapshot with commit id a0aef2f.
2017-05-19 10:20:55 -05:00
Ryan Ernst 2a65bed243 Tests: Change rest test extension from .yaml to .yml (#24659)
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
2017-05-16 17:24:35 -07:00
Nik Everett bb06d8ec4f Allow plugins to build pre-configured token filters (#24223)
This changes the way we register pre-configured token filters so that
plugins can declare them and starts to move all of the pre-configured
token filters out of core. It doesn't finish the job because doing
so would make the change unreviewably large. So this PR includes
a shim that keeps the "old" way of registering pre-configured token
filters around.

The Lowercase token filter is special because there is a "special"
interaction between it and the lowercase tokenizer. I'm not sure
exactly what to do about it so for now I'm leaving it alone with
the intent of figuring out what to do with it in a followup.

This also renames these pre-configured token filters from
"pre-built" to "pre-configured" because that seemed like a more
descriptive name.

This is a part of #23658
2017-05-09 14:50:49 -04:00
Ryan Ernst 212f24aa27 Tests: Clean up rest test file handling (#21392)
This change simplifies how the rest test runner finds test files and
removes all leniency.  Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.

closes #20240
2017-04-18 15:07:08 -07:00
Adrien Grand 4632661bc7 Upgrade to a Lucene 7 snapshot (#24089)
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.

Some notes about the change:
 - it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
 - it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
 - two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
2017-04-18 15:17:21 +02:00
Jim Ferenczi 0e95c90e9f Upgrade to Lucene 6.5.0 (#23750) 2017-03-27 15:57:54 +02:00