Commit Graph

330 Commits

Author SHA1 Message Date
Nick Knize 3d49ccead2
[Upgrade] Lucene-9.2-snapshot (#2924) 2022-04-21 14:36:01 -07:00
Owais Kazi 3c5d997a76
Added a new line linter (#2875)
* Added linter to add new line

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>

* Fixed new lines

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>

* Ignore empty files

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>

* Updated DEVELOPER GUIDE

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>

* Renamed workflow file

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>

* Fixed failing tests

Signed-off-by: Owais Kazi <owaiskazi19@gmail.com>
2022-04-13 14:14:18 -04:00
Wenjun Ruan 0b1f4a2069
Rename `file` to `dir` in Environment (#2730)
* Rename `file` to `dir` in Environment

Signed-off-by: ruanwenjun <wenjun@apache.org>

* Fix compile error

Signed-off-by: ruanwenjun <wenjun@apache.org>

* fix compile error

Signed-off-by: ruanwenjun <wenjun@apache.org>
2022-04-05 09:23:08 -04:00
Nick Knize db7f4b0848
[Upgrade] Lucene 9.1 release (#2560)
Upgrades to the official 9.1 release

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-03-23 07:26:18 -05:00
Nick Knize e0ebf40964
[Upgrade] ICU4j from 68.2 to 70.1 (#2504)
Upgrade ICU4j to 70.1 to be consistent with Lucene 9.1 dependency.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-03-18 11:28:48 -05:00
Nick Knize 3675400b92
[Remove] type from CIR.mapping and CIRB.mapping (#2478)
First pass to remove types from CreateIndexRequest and CreateIndexRequestBuilder
mapping method. This method is overloaded several times so the most widely used
methods in the RequestBuilder are refactored from mapping to setMapping to avoid
confusion, conflicts, and to be consistent with other method names (e.g.,
setSettings, setCause, setAlias).

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-03-16 17:35:08 -05:00
Nick Knize 05a5819243
[Upgrade] Lucene 9.1.0-snapshot-ea989fe8f30 (#2487)
* [Upgrade] Lucene 9.1.0-snapshot-ea989fe8f30

Upgrades from Lucene 9.0.0 to 9.1.0-snapshot-ea989fe8f30 in preparation for
9.1.0 GA.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* Add spanishplural token filter

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* fix KNOWN_TOKENIZERS

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-03-16 15:47:25 -04:00
Nick Knize 006c832c5f
[Upgrade] Lucene 9.0.0 release (#1109)
This commit upgrades the core codebase from Lucene 8.10.1 to
lucene 9.0.0. It includes all necessary refactoring of features and
API changes when upgrading to a new major Lucene release.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Co-authored-by: Andriy Redko <drreta@gmail.com>
2022-03-15 15:48:13 -05:00
Nick Knize 0bd7850bed
[Remove] remaining type usage in Client and AbstractClient (#2258)
Removes type parameter from remaining prepareIndex in Client and AbstractClient.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-02-25 13:35:48 -06:00
Nick Knize 7fe642fda5
[Remove] Type from Search Internals (#2109)
With types deprecation the type support is removed from internal search API
(SearchRequest and QueryShardContext).

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-02-17 16:17:31 -06:00
Nick Knize 6428d1a2a6
[Refactor] MapperService to QueryShardContext in valueFetcher (#2027)
* [Refactor] MapperService to QueryShardContext in valueFetcher

QueryShardContext encapsulates MapperService along with other mechanisms useful
for analyzers, highlighters, and readers. This commit refactors the valueFetcher
to use the QueryShardContext as a parameter instead of direct access to the
MapperService. This is an API change. Behavior of the method is unaffected.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* change unnecessary Optional usage

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>

* spotlessApply

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-02-08 08:38:20 -05:00
Martin Gaievski 309397520c
Adding support for JDK17 and removing JDK8 (#2025)
* Adding support for JDK17 and removing JDK8

Signed-off-by: Martin Gaievski <gaievski@amazon.com>

* Merge overlaping PR, bumping min java version to 11

Signed-off-by: Martin Gaievski <gaievski@amazon.com>

* Removing references to JDK8 from dev guide

Signed-off-by: Martin Gaievski <gaievski@amazon.com>
2022-02-02 20:59:10 -05:00
Nick Knize 5550f8d7e2
[Remove] Analyzer Deprecations (#1741)
This commit removes deprecated analyzer instantiation that is no longer
permitted in OpenSearch 2.0.0.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-12-16 12:51:49 -06:00
Vacha e66ea2c4f3
Avoid logging duplicate deprecation warnings multiple times (#1660)
* Avoid logging duplicate deprecation warnings multiple times

Signed-off-by: Vacha <vachshah@amazon.com>

* Fixes test failures

Signed-off-by: Vacha <vachshah@amazon.com>

* Adding deprecation logger tests

Signed-off-by: Vacha <vachshah@amazon.com>

* Using ConcurrentHashMap keySet

Signed-off-by: Vacha Shah <vachshah@amazon.com>
2021-12-15 15:26:44 -08:00
Himanshu Setia 681e5548c1
Enabling spotless, disabling checkstyle check on plugins (#1488)
* Enabling spotless, disabling checkstyle on below modules

:plugins:mapper-annotated-text
:plugins:mapper-murmur3
:plugins:mapper-size
:plugins:repository-azure
:plugins:repository-gcs
:plugins:repository-hdfs
:plugins:repository-s3
:plugins:store-smb
:plugins:transport-nio
:qa:die-with-dignity

Signed-off-by: Himanshu Setia <setiah@amazon.com>

* Enabling spotless for more plugins

Signed-off-by: Himanshu Setia <setiah@amazon.com>

* Fixing error in merge conflict

Signed-off-by: Himanshu Setia <setiah@amazon.com>
2021-11-01 17:40:06 -07:00
Nick Knize 53334b2ce4
Upgrade to Lucene 8.10.1 (#1440)
This commit upgrades to the latest release of lucene 8.10

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-10-28 10:06:53 -05:00
Nick Knize 5ae00456a0
Upgrade to Lucene 8.9 (#1080)
This commit upgrades to the official lucene 8.9 release

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-08-20 11:28:06 -05:00
Nick Knize c5a3c3cb41
Update lucene version to 8.8.2 (#557)
This commit updates the codebase to the latest released version of Lucene.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-04-23 09:48:41 -05:00
Nick Knize 0ba0e7cc26
[Versioning] Rebase to OpenSearch version 1.0.0 (#555)
This commit rebases the versioning to OpenSearch 1.0.0

Co-authored-by: Rabi Panda <adnapibar@gmail.com>

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-04-15 17:06:47 -05:00
Nick Knize 9168f1fb43
[License] Add SPDX and OpenSearch Modification license header (#509)
This commit adds the SPDX Apache-2.0 license header along with an additional
copyright header for all modifications.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-04-09 14:28:18 -05:00
Nick Knize 5b46a05702 [Rename] remaining packages and resources in test/fixture (#364)
This commit refactors the remaining o.e.index and o.e.test packages in the
test/fixtures module. References throughout the codebase are also refactored.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-03-21 20:56:34 -05:00
Harold Wang 82f9ff93cb [Rename] plugins (#193)
* [Rename] plugins (#193)

This PR refactors files under "plugins" folders part of the Elasticsearch to OpenSearch renaming effort.

Signed-off-by: Harold Wang <harowang@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize ccceb381db [Rename] ElasticsearchException class in server module (#165)
This commit refactors the ElasticsearchException class located in the server module
to OpenSearchException. References and usages throughout the rest of the
codebase are fully refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Ignacio Vera 4851bc7bae
Upgrade to Lucene-8.7.0 (#64532) (#64537) 2020-11-03 16:57:04 +01:00
Ignacio Vera d0f5066310
Upgrade to lucene-8.7.0-snapshot-72d8528c3a6 (#63912) (#63928) (#63933) 2020-10-20 15:08:06 +02:00
Mayya Sharipova e022b78198
Upgrade to lucene-8.7.0-snapshot-5c4168d (#63466)
This disables sort optim on _doc, which may still be unstable.
Backport for #63444
2020-10-08 08:20:43 -04:00
Mayya Sharipova e236ea43e9 Upgrade to lucene-8.7.0-snapshot-e914862 (#63401)
Backport for: #63395
2020-10-07 09:45:14 -04:00
Mayya Sharipova f2ba62b894
Upgrade to lucene- 8.7.0-snapshot-66c49a35402 (#63372)
This includes fixing a bug in doc iteration during sort optimization

Backport for #63349
2020-10-06 22:38:58 -04:00
Julie Tibshirani f17ca18dfa
Make array value parsing flag more robust. (#63371)
When constructing a value fetcher, the 'parsesArrayValue' flag must match
`FieldMapper#parsesArrayValue`. However there is nothing in code or tests to
help enforce this.

This PR reworks the value fetcher constructors so that `parsesArrayValue` is
'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must
explicitly set it to true and ensure the behavior is covered by tests.

Follow-up to #62974.
2020-10-06 17:49:25 -07:00
Nhat Nguyen 1a6837883a Upgrade to Lucene-8.7.0-snapshot-77396dbf339 (#63222)
Includes LUCENE-9554, which exposes the pendingNumDocs from IndexWriter.
2020-10-05 14:39:30 -04:00
Alan Woodward 01950bc80f
Move FieldMapper#valueFetcher to MappedFieldType (#62974) (#63220)
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
2020-10-04 14:54:59 +01:00
Mayya Sharipova 4c8c3c8df6
Upgrade lucene to lucene-8.7.0-snapshot-3b59906 (#62978)
Backport for #62970
2020-09-28 16:52:31 -04:00
Alan Woodward e28750b001
Add parameter update and conflict tests to MapperTestCase (#62828) (#62902)
This commit adds a mechanism to MapperTestCase that allows implementing
test classes to check that their parameters can be updated, or throw conflict
errors as advertised. Child classes override the registerParameters method
and tell the passed-in UpdateChecker class about their parameters. Simple
conflicts can be checked, using the existing minimal mappings as a base to
compare against, or alternatively a particular initial mapping can be provided
to check edge cases (eg, norms can be updated from true to false, but not
vice versa). Updates are registered with a predicate that checks that the update
has in fact been applied to the resulting FieldMapper.

Fixes #61631
2020-09-24 20:38:12 +01:00
Luca Cavanna 862fab06d3
Share same existsQuery impl throughout mappers (#57607)
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.

There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.

This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.

At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
2020-09-23 11:00:53 +02:00
Luca Cavanna 5ca86d541c
Move stored flag from TextSearchInfo to MappedFieldType (#62717) (#62770) 2020-09-23 09:40:34 +02:00
markharwood a0df0fb074
Search - add case insensitive flag for "term" family of queries #61596 (#62661)
Backport of fe9145f

Closes #61546
2020-09-22 13:56:51 +01:00
Luca Cavanna 9ae29713fd
Dense vector field type minor fixes (#62631)
The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x).

This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.
2020-09-22 10:40:51 +02:00
Adrien Grand 4de8579455
Upgrade to lucene-8.7.0-snapshot-830bd186a8d. (#62596) 2020-09-18 09:51:34 +02:00
Adrien Grand 9a8225bbc1
Upgrade to lucene-8.7.0-snapshot-9cd3af50f80. (#62450) (#62476)
This new snapshot contains the following JIRAs that we're interested in:
 - [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525)
Better handling of small documents. This should improve retrieval times
when documents are less than ~1kB.
 - [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510)
Faster flushes when index sorting is enabled by not compressing the
temporary files that store stored fields and term vectors.
2020-09-17 10:28:20 +02:00
Nik Everett 24a24d050a
Implement fields fetch for runtime fields (backport of #61995) (#62416)
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to #59332
2020-09-15 20:24:10 -04:00
Adrien Grand 6db8afefc2
Upgrade to lucene-8.7.0-snapshot-cdfdc1e0851. (#62376)
Upgrade to a new Lucene snapshot that (at least partially) addresses the
indexing rate regression when index sorting is enabled.

Backport of #62334.
2020-09-15 17:48:07 +02:00
Ignacio Vera c8981ea93d
upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213) (#62222) 2020-09-10 16:23:18 +02:00
Ignacio Vera 31c026f25c
upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957) (#61974) 2020-09-04 13:46:20 +02:00
Jason Tedor 64cd229b35
Upgrade to Lucene 8.6.2 (#61688)
This commit upgrades the Lucene dependencies to 8.6.2.
2020-08-31 09:54:07 -04:00
Luca Cavanna f769821bc8
Pass SearchLookup supplier through to fielddataBuilder (#61430) (#61638)
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.

To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.

As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.

With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.

Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.

This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.

Relates to #59332

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-08-27 18:09:56 +02:00
Przemyslaw Gomulka 9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Nik Everett 87cf81e179
Migrate some more mapper test cases (#61507) (#61552)
Migrate some more mapper test cases from `ESSingleNodeTestCase` to
`MapperTestCase`.
2020-08-25 15:27:26 -04:00
markharwood 8b56441d2b
Search - add case insensitive support for regex queries. (#59441) (#61532)
Backport to add case insensitive support for regex queries. 
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.

Closes #59235
2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka f3f7d25316
Header warning logging refactoring backport(#55941) (#61515)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
backports #55941
2020-08-25 16:35:54 +02:00
Julie Tibshirani 997c73ec17
Correct how field retrieval handles multifields and copy_to. (#61391)
Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.

To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.

The PR is fairly big but each commit can be reviewed individually.

Fixes #61033.
2020-08-20 15:53:35 -07:00