151 Commits

Author SHA1 Message Date
Nick Knize
50c3251d36 [Rename] o.e.common.settings (#336)
This commit refactors o.e.common.settings package to the
o.opensearch.common.setttings namespace. All references throughout the codebase
are refactored.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-03-21 20:56:34 -05:00
Harold Wang
8ee7296fea [Rename] org.elasticsearch.plugins (#348)
* Rename org.elasticsearch.gateway to org.opensearch.gateway

Signed-off-by: Harold Wang <harowang@amazon.com>

* Rename org.elasticsearch.http to org.opensearch.http

Signed-off-by: Harold Wang <harowang@amazon.com>

* Renames org.elasticsearch.plugins to org.opensearch.plugins

Signed-off-by: Harold Wang <harowang@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize
3fc453eace [Rename] o.e.common.util (#337)
This commit refactors the o.e.common.util package to the
o.opensearch.common.util namespace. All references throughout the codebase have
been refactored.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-03-21 20:56:34 -05:00
Nick Knize
f216f2e556 [Rename] o.e.common.logging,lucene (#335)
This commit refactors the following packages:

* o.e.common.logging
* o.e.common.lucene

to the o.opensearch.common parent package. References throughout the codebase
have also been refactored.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-03-21 20:56:34 -05:00
Nick Knize
1664f9e495 [Rename] o.e.action.search (#344)
This commit refactors the o.e.action.search package to o.opensearch.action
namespace. All references throughout the code are also refactored.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2021-03-21 20:56:34 -05:00
Nick Knize
946c7bb2dc [Rename] o.e.common subpackages round 1 (#332)
* [Rename] o.e.common subpackages round 1

This commit refactors the following subpackages of o.e.common:

* o.e.common.joda
* o.e.common.lease
* o.e.common.metrics
* o.e.common.network
* o.e.common.path
* o.e.common.recycling
* o.e.common.regex
* o.e.common.rounding
* o.e.common.text
* o.e.common.time
* o.e.common.transport

to the o.opensearch namespace. All references throughout the codebase have been
refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>

* fix imports 1

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize
c4565adc9d [Rename] o.e.common.geo, hash, io (#317)
This commit refactors the following packages:

* o.e.common.geo
* o.e.common.hash
* o.e.common.io

into the o.opensearch.common namespace. All references throughout the codebase
have been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Sarat Vemulapalli
d65bccc25d Renaming server/env to OpenSearch (#314)
Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>
2021-03-21 20:56:34 -05:00
Rabi Panda
63edce1243 [Rename] refactor more of o.e.search package in the server module. (#306)
As part of this commit we refactor the following in the o.e.search package:

- rename `org.elasticsearch.search.fetch` to `org.opensearch.search.fetch`
- rename `org.elasticsearch.search.internal` to `org.opensearch.search.internal`
- rename `org.elasticsearch.search.profile` to `org.opensearch.search.profile`
- rename `org.elasticsearch.search.query` to `org.opensearch.search.query`
- rename `org.elasticsearch.search.suggest` to `org.opensearch.search.suggest`
- rename other instances of Elasticsearch to OpenSearch in these packages.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Nick Knize
dafc0510ea [Rename] o.e.common classes (#305)
This commit refactors classes under o.e.common to o.opensearch.common. All
references throughout the codebase have also been refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize
5ad13ca2f7 [Rename] o.e.cluster (#297)
This commit refactors the remaining o.e.cluster packages to
o.opensearch.cluster. All references throughout the codebase are also
refactored.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda
f17ba72a98 [Rename] refactor some of the o.e.search packages in the server module. (#294)
Refactoring:

- rename `org.elasticsearch.search.builder` to `org.opensearch.search.builder`
- rename `org.elasticsearch.search.collapse` to `org.opensearch.search.collapse`
- rename `org.elasticsearch.search.dfs` to `org.opensearch.search.dfs`
- rename `org.elasticsearch.search.lookup` to `org.opensearch.search.lookup`
- rename `org.elasticsearch.search.lookup` to `org.opensearch.search.lookup`
- rename `org.elasticsearch.search.rescore` to `org.opensearch.search.rescore`
- rename `org.elasticsearch.search.searchafter` to `org.opensearch.search.searchafter`
- rename `org.elasticsearch.search.slice` to `org.opensearch.search.slice`
- rename `org.elasticsearch.search.sort` to `org.opensearch.search.sort`

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Nick Knize
fe2b5d6d39 [Rename] o.e.version (#296)
This commit refactors o.e.Version to o.opensearch.Version. This is retained in a
single commit to serve as a reference for re-versioning the opensearch codebase
from legacy 7.10 to 1.0.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize
452f6e1b81 [Rename] server cli and client (#254)
This commit refactors the o.e.cli and o.e.client packages from elasticsearch to
o.opensearch.cli and o.opensearch.client packages in the server module,
respectively.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Nick Knize
fe7f29f549 [Rename] o.e.cluster.health,metadata,node (#283)
This commit refactors the following subpackages:

* o.e.cluster.health
* o.e.cluster.metadata
* o.e.cluster.node

to o.opensearch.cluster.*. All other references throughout the codebase are
updated.

Signed-off-by: Nicholas Knize <nknize@amazon.com>
2021-03-21 20:56:34 -05:00
Rabi Panda
6ee930ac0d [Rename] refactor o.e.repositories in the server module. (#275)
Refactor the repositories package in the server module to rename the package from `org.elasticsearch.repositories` to `org.opensearch.repositories`

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Rabi Panda
16c3b54639 [Rename] refactor o.e.script package in server module. (#250)
Refactor the package`org.elasticsearch.script` in server module to rename it to`org.opensearch.script`.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Rabi Panda
dc4736dca1 [Rename] refactor server/threadpool package. (#267)
Refactor the server/threadpool package to rename the package names from`org.elasticsearch.threadpool` to `org.opensearch.threadpool`.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Rabi Panda
584efd7970 [Rename] modules/lang-painless (#210)
Refactor lang-painless module as part rename to OpenSearch work.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Rabi Panda
abeb41b486 [Rename] modules/analysis common (#200)
This commit refactors the common-analysis module as part of the Elasticsearch to OpenSearch renaming.

Signed-off-by: Rabi Panda <adnapibar@gmail.com>
2021-03-21 20:56:34 -05:00
Alan Woodward
fb84b6710d
Restore use of default search and search_quote analyzers (#65491) (#65562)
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.

Fixes #65434
2020-11-26 18:34:59 +00:00
Przemyslaw Gomulka
9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Przemyslaw Gomulka
f3f7d25316
Header warning logging refactoring backport(#55941) (#61515)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
backports #55941
2020-08-25 16:35:54 +02:00
Jake Landis
92ce41cfaf
[7.x] Introduce javaRestTest source set/task and convert modules (#59939) (#60026)
Introduce a javaRestTest source set and task to compliment the yamlRestTest.
javaRestTest differs such that the code is sourced from Java and may have
different dependencies and setup requirements for the test clusters. This also
allows the tests to run in parallel in different cluster instances to prevent any
cross test contamination between the two types of tests.

Included in this PR is all :modules no longer use the integTest task. The tests
are now driven by test, yamlRestTest, javaRestTest, and internalClusterTest.
Since only :modules (and :rest-api-spec) have been converted to yamlRestTest
we can now disable the integTest task if either yamlRestTest or javaRestTest have
been applied. Once all projects are converted, we can delete the integTest task.

related: #56841
related: #59444
2020-07-28 08:39:11 -05:00
malpani
0555fef799 Support ignore_keywords flag for word delimiter graph token filter (#59563)
This commit allows customizing the word delimiter token filters to skip processing 
tokens tagged as keyword through the `ignore_keywords` flag Lucene's 
WordDelimiterGraphFilter already exposes.

Fix for #59491
2020-07-21 16:11:55 +01:00
Jake Landis
665b7b7bd8
Convert modules to use yamlRestTest (#59089) (#59446)
This commit moves the modules REST tests to the
newly introduced yamlRestTest source set. A few
tests have also been re-named to include the correct
IT suffix. Without changing the names, the testing
conventions task would fail since now that the YAML
tests are no longer present pacify the convention.
These tests have moved to the internalClusterTest
source set.

related: #56841
2020-07-13 13:53:05 -05:00
Jake Landis
604c6dd528
7.x - Create plugin for yamlTest task (#56841) (#59090)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 14:16:26 -05:00
Tomasz Elendt
a7c36c8af5 Support multiple tokens on LHS in stemmer_override rules (#56113) (#56484)
This commit adds support for rules with multiple tokens on LHS, also
known as "contraction rules", into stemmer override token
filter. Contraction rules are handy into translating multiple
inflected words into the same root form. One side effect of this change is
that it brings stemmer override rules format closer to synonym rules
format so that it makes it easier to translate one into another.

This change also makes stemmer override rules parser more strict so
that it should catch more errors which were previously accepted.

Closes #56113
2020-05-29 22:34:31 +02:00
Andrei Balici
19a336e8d3 Add max_token_length setting to the CharGroupTokenizer (#56860)
Adds `max_token_length` option to the CharGroupTokenizer.
Updates documentation as well to reflect the changes.

Closes #56676
2020-05-20 14:28:40 +02:00
markharwood
e197b6c45b
Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432) (#56100)
Authored-by: Amit Khandelwal <amitmbm87@gmail.com>
2020-05-04 11:31:28 +01:00
Amit Khandelwal
126e4acca8 Expose preserve_original in edge_ngram token filter (#55766)
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram`
token filter. This change adds it with a default value of `false`.

Closes #55767
2020-04-28 10:24:27 +02:00
Rory Hunter
d66af46724
Always use deprecateAndMaybeLog for deprecation warnings (#55319)
Backport of #55115.

Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
2020-04-23 09:20:54 +01:00
David Turner
7941f4a47e Add RepositoriesService to createComponents() args (#54814)
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.

After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
2020-04-16 16:27:36 +01:00
Jason Tedor
5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Jake Landis
db3420d757
[7.x] Optimize which Rest resources are used by the Rest tests… (#53766)
This should help with Gradle's incremental compile such that projects
only depend upon the resources they use.

related #52114
2020-03-19 12:28:59 -05:00
Jay Modi
f3f6ff97ee
Single instance of the IndexNameExpressionResolver (#52604)
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.

In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.

Backport of #52596
2020-02-21 07:50:02 -07:00
Adrien Grand
ad9d2f1922
Move analysis/mappings stats to cluster-stats. (#51875)
Closes #51138
2020-02-05 11:02:25 +01:00
Marios Trivyzas
fda25ed04a
Fix caching for PreConfiguredTokenFilter (#50912) (#51091)
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
(cherry picked from commit 24e1858)
2020-01-16 13:58:02 +01:00
Christoph Büscher
2f13751bad
Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862) (#50991)
We deprecated and removed the camel-case versions of the nGram and edgeNGram
filters a while ago and we should do the same with the nGram and edgeNGram tokenizers.
This PR deprecates the use of these names in favour of ngram and edge_ngram in
7. Usage will be disallowed on new indices starting with 8 then.
2020-01-14 21:42:34 +01:00
Alan Woodward
4974f56b25 Fix analysis BWC tests - warnings now emitted on index creation 2020-01-14 14:48:40 +00:00
Alan Woodward
8c16725a0d Check for deprecations when analyzers are built (#50908)
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.

Fixes #42349
2020-01-14 13:52:02 +00:00
Christoph Büscher
b1b4282273 Make Multiplexer inherit filter chains analysis mode (#50662)
Currently, if an updateable synonym filter is included in a multiplexer filter,
it is not reloaded via the _reload_search_analyzers because the multiplexer
itself doesn't pass on the analysis mode of the filters it contains, so its not
recognized as "updateable" in itself. Instead we can check and merge the
AnalysisMode settings of all filters in the multiplexer and use the resulting
mode (e.g. search-time only) for the multiplexer itself, thus making any synonym
filters contained in it reloadable.  This, of course, will also make the
analyzers using the multiplexer be usable at search-time only.

Closes #50554
2020-01-08 22:12:01 +01:00
Christoph Büscher
6258d25458
Log deprecation for nGram and edgeNGram custom filters (#50376) (#50445)
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
deprecation warnings for `nGram` and `edgeNGram` respectively on version 7
indices.

Relates #50360
2019-12-20 22:00:08 +01:00
Stuart Tettemer
689df1f28f
Scripting: ScriptFactory not required by compile (#50344) (#50392)
Avoid backwards incompatible changes for 8.x and 7.6 by removing type
restriction on compile and Factory.  Factories may optionally implement
ScriptFactory.  If so, then they can indicate determinism and thus
cacheability.

**Backport**

Relates: #49466
2019-12-19 12:50:25 -07:00
Stuart Tettemer
17cda5b2c0
Scripting: Groundwork for caching script results (#49895) (#49944)
In order to cache script results in the query shard cache, we need to
check if scripts are deterministic.  This change adds a default method
to the script factories, `isResultDeterministic() -> false` which is
used by the `QueryShardContext`.

Script results were never cached and that does not change here.  Future
changes will implement this method based on whether the results of the
scripts are deterministic or not and therefore cacheable.

Refs: #49466

**Backport**
2019-12-06 15:08:05 -07:00
Christoph Büscher
4ffa050735 Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:37:12 +01:00
gpaimla
7d20b50f45 Implement Lucene EstonianAnalyzer, Stemmer (#49149)
This PR adds a new analyzer and stemmer for the Estonian language.

Closes #48895
2019-11-18 17:24:21 +01:00
Rory Hunter
c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Rory Hunter
3c77c50f5f
Improve resiliency to auto-formatting in libs, modules (#48619)
Backport of #48448. Make a number of changes so that code in the libs and
modules directories are more resilient to automatic formatting. This covers:

* Remove string concatenation where JSON fits on a single line
* Move some comments around to they aren't auto-formatted to a strange
  place
2019-10-29 10:39:34 +00:00
Alan Woodward
697c693ee7 Reset Token position on reuse in scripted analysis (#47424)
Most of the information in AnalysisPredicateScript.Token is pulled directly
from its underlying AttributeSource, but we also keep track of the token position,
and this state is held directly on the Token. This information needs to be reset when
the containing ScriptFilteringTokenFilter or ScriptedConditionTokenFilter is re-used.

Fixes #47197
2019-10-02 11:27:04 +01:00