Commit Graph

147 Commits

Author SHA1 Message Date
Alan Woodward 71b8494181
Upgrade to lucene 8.0.0-snapshot-ff9509a8df (#39444)
Backport of #39350

Contains the following:

* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
2019-02-27 14:36:08 +00:00
Julie Tibshirani c2e9d13ebd
Default include_type_name to false in the yml test harness. (#38058)
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.

Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
2019-02-01 11:44:13 -08:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Adrien Grand e9fcb25a28
Upgrade to lucene-8.0.0-snapshot-83f9835. (#37668)
This snapshot uses a new file format for doc-values which is expected to make
advance/advanceExact perform faster on sparse fields:
https://issues.apache.org/jira/browse/LUCENE-8585
2019-01-22 11:44:29 +01:00
Nick Knize b2aa655f46
Upgrade master to lucene-8.0.0-snapshot-a1c6e642aa (#37091)
Updates the master branch to the latest snapshot of Lucene 8.0.
2019-01-02 20:18:19 -06:00
Alan Woodward c7ac9ef826
Upgrade to lucene snapshot 774e9aefbc (#36637)
Includes LUCENE-8607: improvement to MatchAllDocsQuery
2018-12-14 20:30:07 +00:00
Alan Woodward 9ac7359643
Update lucene to snapshot-7e4555a2fd (#36563)
Includes the following:

* Reversion of doc-values changes in LUCENE-8374; we are interested in seeing if this 
  has an effect on benchmarks for node-stats and index-stats
* More improvements to docvalues updates
2018-12-12 20:18:32 +00:00
Nhat Nguyen 3fb5a12b30 Upgrade to Lucene-8.0.0-snapshot-61e448666d (#36518)
Includes:
- LUCENE-8602: Share TermsEnum if possible while applying DV updates
2018-12-12 06:47:40 +01:00
Nhat Nguyen 2a7edca59f
Upgrade to Lucene-8.0.0-snapshot-ef61b547b1 (#36450)
Includes:

- LUCENE-8598: Improve field updates packed values
- LUCENE-8599: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates
2018-12-10 16:33:49 -05:00
Nhat Nguyen 10feb75eb7
Upgrade to Lucene-8.0.0-snapshot-aaa64d70159 (#36335)
Includes:

LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates

Relates #36286
2018-12-06 20:33:25 -05:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Alan Woodward 73ceaad03a
Update to lucene-8.0.0-snapshot-c78429a554 (#36212)
Includes:

* A fix for a bug in Intervals.or() (https://issues.apache.org/jira/browse/LUCENE-8586)
* The ability to disable offset mangling in WordDelimiterGraphFilter
        (https://issues.apache.org/jira/browse/LUCENE-8509)
* BM25Similarity no longer multiplies scores by k1 + 1
2018-12-05 12:43:56 +00:00
Jim Ferenczi e37a0ef844
Upgrade to lucene-8.0.0-snapshot-67cdd21996 (#35816) 2018-11-22 15:42:59 +01:00
Nick Knize 2591f66a33
upgrade to lucene-8.0.0-snapshot-6d9c714052 (#35428) 2018-11-12 10:48:27 -06:00
Nick Knize a5e1f4d3a2 Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (#35224) 2018-11-06 11:55:23 +01:00
Jim Ferenczi 241c74efb2
upgrade to a new snapshot of Lucene 8 (7d0a7782fa) (#33812) 2018-09-18 18:16:40 +02:00
Alan Woodward 39c3234c2f
Upgrade to latest Lucene snapshot (#33505)
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk
2018-09-10 20:51:55 +01:00
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Nicholas Knize e162127ff3 Upgrade to Lucene-7.5.0-snapshot-13b9e28f9d
The main feature is the inclusion of bkd backed geo_shape with
INTERSECT, DISJOINT, WITHIN bounding box and polygon query support.
2018-08-09 11:15:02 -05:00
Jim Ferenczi 53ff06e621
Upgrade to Lucene-7.5.0-snapshot-608f0277b0 (#32390)
The main highlight is the removal of the reclaim_deletes_weight in the TieredMergePolicy.
The es setting index.merge.policy.reclaim_deletes_weight is deprecated in this commit and the value is ignored. The new merge policy setting setDeletesPctAllowed should be added in a follow up.
2018-07-27 08:28:51 +02:00
Adrien Grand f023e95ae0
Upgrade to Lucene 7.4.0. (#31529)
This moves Elasticsearch from a recent 7.4.0 snapshot to the GA release.
2018-06-22 16:17:17 +02:00
Nhat Nguyen 8453ca638d
Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360) 2018-06-15 10:58:21 -04:00
Tanguy Leroux bf58660482
Remove all unused imports and fix CRLF (#31207)
The X-Pack opening and the recent other refactorings left a lot of 
unused imports in the codebase. This commit removes them all.
2018-06-11 15:12:12 +02:00
Nhat Nguyen abe61159a8
Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (#31073)
This snapshot includes:
- LUCENE-8341: Record soft deletes in SegmentCommitInfo which will resolve #30851
- LUCENE-8335: Enforce soft-deletes field up-front
2018-06-04 14:18:46 -04:00
Martijn van Groningen 544822c78b
Moved keyword tokenizer to analysis-common module (#30642)
Relates to #23658
2018-05-29 19:22:28 +02:00
Nhat Nguyen 363f1e84ca
Upgrade to Lucene-7.4-snapshot-1cbadda4d3 (#30928)
This snapshot includes LUCENE-8328 which is needed to stabilize CCR builds.
2018-05-29 12:29:52 -04:00
Nhat Nguyen 1918a30237
Upgrade to Lucene-7.4.0-snapshot-cc2ee23050 (#30778)
The new snapshot includes LUCENE-8324 which fixes missing checkpoint
after a fully deletes segment is dropped on flush. This snapshot should
resolves failed tests in the CorruptedFileIT suite.

Closes #30741
Closes #30577
2018-05-22 13:11:48 -04:00
Nhat Nguyen 67d8fc222d
Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726)
This snapshot resolves issues related to ShrinkIndexIT.
2018-05-18 18:21:39 -04:00
Nhat Nguyen 519768b5d3
Upgrade to Lucene-7.4-snapshot-6705632810 (#30519)
This snapshot is to include LUCENE-8298 which allows DocValues updates
to reset a value. This is needed for the Lucene rollback work.
2018-05-10 12:31:45 -04:00
Jim Ferenczi dbd857341f
Upgrade to 7.4.0-snapshot-1ed95c097b (#30357)
Upgrade to lucene-7.4.0-snapshot-1ed95c097b

This version contains:
* An Analyzer for Korean
* An IntervalQuery and IntervalsSource that retrieve minimum intervals of positional queries.
* A new API to retrieve matches (offsets and positions) of a query for a single document.
* Support for soft deletes in the index writer.
* A fixed shingle filter that handles index time synonyms.
* Support for emoji sequence in ICUTokenizer (with an upgrade to icu 61.1)
2018-05-04 11:44:22 +02:00
Alan Woodward dccd43af47
Upgrade to lucene 7.3.0 (#29387) 2018-04-05 10:34:44 +01:00
Adrien Grand 3bdfc8f3fb
Upgrade to lucene-7.3.0-snapshot-98a6b3d. (#29298)
Most notable changes include:
 - this release doesn't have the 7.2.1 version constant so I had to create one
 - spatial4j and jts were upgraded
2018-04-03 09:27:14 +02:00
Jim Ferenczi be012b1326
upgrade to lucene 7.2.1 (#28218) 2018-01-15 16:47:46 +01:00
Adrien Grand 77711508b0
Upgrade to Lucene 7.2.0. (#27910) 2017-12-20 14:17:40 +01:00
Adrien Grand 6323bb0d97
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27619)
This new snapshot mostly brings a change to TopFieldCollector which can now
early terminate collection when trackTotalHits is `false`.

As a follow-up, we should replace our usage of
`EarlyTerminatingSortingCollector` with this new option.
2017-12-04 09:40:08 +01:00
Adrien Grand 996990ad1f
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496)
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
2017-11-28 14:52:42 +01:00
David Roberts 749c3ec716
Remove the single argument Environment constructor (#27235)
Only tests should use the single argument Environment constructor.  To
enforce this the single arg Environment constructor has been replaced with
a test framework factory method.

Production code (beyond initial Bootstrap) should always use the same
Environment object that Node.getEnvironment() returns.  This Environment
is also available via dependency injection.
2017-11-04 13:25:09 +00:00
Colin Goodheart-Smithe c1b8140c83
Upgrade to Lucene 7.1 (#27225) 2017-11-02 13:25:33 +00:00
Md. Abdulla-Al-Sun a40c474e10
Added Bengali Analyzer to Elasticsearch with respect to the lucene update(PR#238) 2017-10-05 13:25:05 +02:00
Martijn van Groningen dca787ed8a
upgrade to Lucene 7.1.0 snapshot version 2017-10-05 09:06:56 +02:00
Jason Tedor e0db89bc35 Upgrade to Lucene 7.0.0
This commit upgrades to the GA release of Luence 7!

Relates #26744
2017-09-21 19:19:33 -04:00
Adrien Grand 78681bc9e5 Upgrade to lucene-7.0.0-snapshot-d94a5f0. (#26441) 2017-08-31 09:06:40 +02:00
Adrien Grand f0c1e30544 Upgrade to lucene-7.0.0-snapshot-a128fcb. (#26090) 2017-08-08 13:03:19 +02:00
Adrien Grand 481d5d09b2 Upgrade to lucene-7.0.0-snapshot-00142c9. (#25641)
Lucene 7.0 is feature-frozen now, so there should not be many changes until GA.
2017-07-11 13:58:55 +02:00
Adrien Grand 44e9c0b947 Upgrade to lucene-7.0.0-snapshot-ad2cb77. (#25349)
Most notable changes:
 - better update concurrency: LUCENE-7868
 - TopDocs.totalHits is now a long: LUCENE-7872
 - QueryBuilder does not remove the boolean query around multi-term synonyms:
   LUCENE-7878
 - removal of Fields: LUCENE-7500

For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding
of vInts and vLongs are compatible: you can write and read with any of them as
long as the value can be represented by a positive int.
2017-06-22 12:35:33 +02:00
Adrien Grand 0c117145f6 Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222)
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
2017-06-15 09:52:07 +02:00
Nicholas Knize deb7caf4d3 Upgrade to lucene-7.0.0-snapshot-a0aef2f
This commit upgrades master to a current lucene snapshot with commit id a0aef2f.
2017-05-19 10:20:55 -05:00
Ryan Ernst 2a65bed243 Tests: Change rest test extension from .yaml to .yml (#24659)
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
2017-05-16 17:24:35 -07:00
Nik Everett bb06d8ec4f Allow plugins to build pre-configured token filters (#24223)
This changes the way we register pre-configured token filters so that
plugins can declare them and starts to move all of the pre-configured
token filters out of core. It doesn't finish the job because doing
so would make the change unreviewably large. So this PR includes
a shim that keeps the "old" way of registering pre-configured token
filters around.

The Lowercase token filter is special because there is a "special"
interaction between it and the lowercase tokenizer. I'm not sure
exactly what to do about it so for now I'm leaving it alone with
the intent of figuring out what to do with it in a followup.

This also renames these pre-configured token filters from
"pre-built" to "pre-configured" because that seemed like a more
descriptive name.

This is a part of #23658
2017-05-09 14:50:49 -04:00
Nik Everett caf376c8af Start building analysis-common module (#23614)
Start moving built in analysis components into the new analysis-common
module. The goal of this project is:
1. Remove core's dependency on lucene-analyzers-common.jar which should
shrink the dependencies for transport client and high level rest client.
2. Prove that analysis plugins can do all the "built in" things by moving all
"built in" behavior to a plugin.
3. Force tests not to depend on any oddball analyzer behavior. If tests
need anything more than the standard analyzer they can use the mock
analyzer provided by Lucene's test infrastructure.
2017-04-19 18:51:34 -04:00