Add `irish` analyzer
Add `sorani` analyzer (Kurdish)
Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc.
Add `thai` tokenizer: segments thai text into words.
Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer
Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself
Add `german_normalization` tokenfilter: umlaut/sharp S normalization
Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences
Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages
Add `sorani_normalization` tokenfilter: normalizes kurdish text
Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text
Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization`
Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk`
Add support access to default Thai stopword set "_thai_"
Fix some bugs and broken links in documentation.
Closes#5935
`-Dtests.filter` allows to pass filter expressions to the elasticsearch
tests. This allows to filter test annotaged with TestGroup annotations
like @Slow, @Nightly, @Backwards, @Integration with a boolean expresssion like:
* to run only backwards tests run:
`mvn -Dtests.bwc.version=X.Y.Z -Dtests.filter="@backwards"`
* to run all integration tests but skip slow tests run:
`mvn -Dtests.filter="@integration and not @slow"
* to take defaults into account ie run all test as well as backwards:
`mvn -Dtests.filter="default and @backwards"
This feature is a more powerful alternative to flags like
`-Dtests.nighly=true|false` etc.
Closes#6703
For the casual reader, the reference to "term queries" may be glossed over, yielding an unexpected result when using `regexp` queries.
This attempts to make that distinction more prominent.
Closes#6698
today we track both the index name and type for mapping updates in the shard bulk action, but we only work against on index in this level, so no need to track the index name itself
closes#6695
Both the Master and Node fault detection register themselves to be notified when a node disconnects to be able to respond to it accordingly. As such, when a ConnectionTransportException was raised on a ping request, it was not handled as it is already handled somewhere else. However, this does introduce a racing condition, if the disconnect happen during a period where there is no current master (minimum_master_node breach) at which time the fault detection is not active. In this case, we will only discover the disconnect error during the ping request, so we have to respond accordingly.
Closes#6686
randomization
We randomize the XContentType to test deriving the content type on all
APIs. Yet, BWC tests run against versions where CBOR wasn't around
this commit ensures we don't use CBOR when compatibility version is
less than `1.2.0`
Closes#6691
The concrete DocMapper on the master will be updated before the mapping in the cluster state. The DocMapper is updated during the cluster update task. This can lead to occasional assertion failures on the mapping response, because that is based on the mapping the cluster state, which may not yet have been updated. (time window between the DocMapping is updated, but the mapping in the cluster state isn't)
when an indexing request introduces a new mapping, today we rely on the parsing logic to mark it as modified on the "first" parsing phase. This can cause sending of mapping updates to master even when the mapping has been introduced in the create index/put mapping case, and can cause sending mapping updates without needing to.
This bubbled up in the disabled field data format test, where we explicitly define mappings to not have the update mapping behavior happening, yet it still happens because of the current logic, and because in our test we delay the introduction of any mapping updates randomly, it can get in and override updated ones.
closes#6669
In case an update request failed (for example when updating with a
wrongly formatted date), the returned index operation type was index
instead of update.
Closes#6630
At the moment the IndexingMemoryController can try to update the index buffer memory of shards at any give moment. This update involves a flush, which may cause a FlushNotAllowedEngineException to be thrown in a concurrently finalizing recovery.
Closes#6642, closes#6667
Due to a bogus if-check in SearchSourceBuilder.fetchSource(String include, String exclude)
the excludes only got set when the includes were not null. Fixed this and added some
basic tests.
Closes#6632
Sorting fails on unmapped fields so the new propagation delay of the mappings
exposed this issue. I added explicit mappings as part of index creation to fix it.
when the primary shard is recovering its translog, make sure to wait for new mapping introductions till the mappings have been updated on the master before finalizing the recovery itself
also, this change performs the mapping updates in a more optimized manner by batching the types to change into a single set and sending after the translog has been replayed
also, remove the wait for mapping on master in the local state tests since this new behavior covers it
closes#6666
remove waiting for mapping on master since we do it in recovery
Some IO api can return after writing & reading only a part of the requested data. On these rare occasions, we should call the methods again to read/write the rest of the data. This has cause rare translog corruption while writing huge documents on Windows.
Noteful parts of the commit:
- A new Channels class with utility methods for reading and writing to channels
- Writing or reading to channels is added to the forbidden API list
- Added locking to SimpleFsTranslogFile
- Removed FileChannelInputStream which was not used
Closes#6441 , #6576
All settings should be passes as settings and the enviroment should not
influence the test cluster settings. The settings we care about ie.
`es.node.mode` and `es.logger.level` should be passed via settings.
This allows tests to override these settings if they for instance need
`network` transport to operate at all.
Closes#6663
Today `index.version.created` depends on the version of the master
node in the cluster. This is potentially causing new features to be
expected on shards that didn't exist when the index was created.
There is no notion of `where was the shard allocated first` such that
`index.version.created` can't be reliably used as a feature flag.
With this change the `index.version.created` can be reliably used to
determin the smallest nodes version at the point in time when the index
was created. This means we can safely use certain features that would
for instance require reindeing and / or would not work if not the
entire index (all shards and segments) have been created with a certain
version or newer.
Closes#6660
When three threads are trying to write checksums at the same time, it's possible for all three threads to obtain the same checksum file name A. Then the first thread enters the synchronized section, creates the file with name A and exits. The second thread enters the synchronized section, checks that A exists, creates file A+1 and exits the critical section. Then it proceeds to clean up and deletes all checksum files including A. If it happens before the third thread enters the synchronized section, it's possible for the third thread to check for A and since it no longer exists create the checksum file A the second time, which triggers "file _checksums-XXXXXXXXXXXXX was already written to" exception in MockDirectoryWrapper and fails recovery.
They were due to a combination of mappings propagation delays and the behavior
of MapperService.smartName(String) so mappings are now configured up-front.
Makes it possible to delete snapshots that are missing some of the metadata files. This can happen if snapshot creation failed because repository drive ran out of disk space.
Closes#6383