The `_create` API is handy way to specify an index operation should only be done if the document doesn't exist. This is currently implemented in explicit code paths all the way down to the engine. However, conceptually this is no different than any other versioned operation - instead of requiring a document is on a specific version, we require it to be deleted (or non-existent). This PR removes Engine.Create in favor of a slight extension in the VersionType logic.
There are however a couple of side effects:
- DocumentAlreadyExistsException is removed and VersionConflictException is used instead (with an improved error message)
- Update will reject version parameters if the upsert option is used (it doesn't compute anyway).
- Translog.Create is also removed infavor of Translog.Index (that's OK because their binary format was the same, so we can just read Translog.Index of the translog file)
Closes#13955
It is rarely used and was not consistently handled by different distributions anyway.
This commit also adds a test for specifying CONF_DIR when installing plugins and
starting elasticsearch.
relates to #12712 and #12954closes#5329closes#13715
With 2.0, we now bind to `localhost` by default instead of binding to the network card and use its IP address.
When the discovery plugin gets from AWS API the list of nodes that should form the cluster, this list is pinged then. But as each node is bound to `localhost`, ping does not get an answer and the node elects itself as the master node.
`network.host` must be set.
Closes#13589.
Types are still optional, but if you do provide them, they can't be null. Split the existing constructor that accepted nnull into two, one that accepts no arguments, and another one that accepts the types argument, which must be not null.
Also trimmed down different ways of setting ids, some were misleading as they would always add the ids to the existing ones and not set them, the add prefix makes that clear. Left `addIds` method that accepts a varargs argument. Added check for ids not be null.
TermsLookupQueryBuilder was left around only for bw comp reasons, but TermsQueryBuilder is its replacement. We can remove it now that it is clear query refactoring goes in master (3.0).
This commit fixes ping timeout settings inconsistencies in
ZenDiscovery. In particular, the documentation refers to the ping
timeout setting as discovery.zen.ping_timeout but the code was
ultimately using discovery.zen.ping.timeout if this was set.
This commit also changes all instances of the raw string
“discovery.zen.ping_timeout” to the constant
o.e.d.z.ZenDiscovery.SETTING_PING_TIMEOUT.
Finally, this commit removes the legacy setting
"discovery.zen.initial_ping_timeout".
Closes#6579, #9581, #9908
The current MoreLikeThisQueryBuilder validation checks for existence of at
least one `like` text or item. This is hard to check in setters, so this PR
tries to change the construction of the query so that we can do these checks
already at construction time.
Changing to using arrays for fieldnames, likeTexts, likeItems, unlikeTexts
and unlikeItems. `likeTexts` and/or `likeItems` need to be specified at
construction time to validate we have at least one item there.
Relates to #10217
This commit moves the size and ops based flush into a synchronous API into
IndexShard and removes the time-based flush alltogether since it' basically
covered by the inactive async flush API we have today. The functionality doesn't
need to be covered by scheduled task and async APIs while we can actually make all
the decisions in a sync manner which is way easier to control and to test.
Closes#13707
Refactor the function_score query so it can be parsed on the coordinating node, split parse into fromXContent and toQuery, make FunctionScoreQueryBuilder Writeable.
Closes#13653
Before this commit he tests always run bin/plugin as root which is somewhat
unrealistic and causes trouble (log files owned by root instead of
elasticsearch). After this commit `bin/plugin` runs as root when elasticsearch
is installed via the repository and as elasticsearch otherwise which is much
more realistic.
This also adds extra timeout to starting elasticsearch which is required
when all the plugins are installed. And it fixes up a problem with logging
elasticsearch's log if elasticsearch doesn't start which came up multiple
time while debugging this problem.
Also adds docs recommending running `bin/plugin` as the user that owns the
Elasticsearch files or root if installed with the packages.
Closes#13557
Moving validation from validate() to constructors and setters for the
following query builders:
* GeoDistanceQueryBuilder
* GeoDistanceRangeQueryBuilder
* GeoPolygonQueryBuilder
* GeoShapeQueryBuilder
* GeohashCellQuery
* TermsQueryBuilder
Relates to #10217
Until now we had a cloud-azure plugin which is providing 3 distinct features:
* discovery on Azure
* snapshot/restore on Aure
* SMB store
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.
This PR is the second batch in moving the query validation we started
to collect in the validate() method to the corresponding setters
and constructors.
With 2.0, we now bind to `localhost` by default instead of binding to the network card and use its IP address.
When the discovery plugin gets from Azure API the list of nodes that should form the cluster, this list is pinged then. But as each node is bound to `localhost`, ping does not get an answer and the node elects itself as the master node.
Closes#13591
Requesting a million hits, or page 100,000 is always a bad idea, but users
may not be aware of this. This adds a per-index limit on the maximum size +
from that can be requested which defaults to 10,000.
This should not interfere with deep-scrolling.
Closes#9311
* Dropped ScoreType in favour of Lucene's ScoreMode
* Removed `score_type` option from `has_child` and `has_parent` queries in favour for the already existing `score_mode` option.
* Removed the score mode `sum` in favour for the already existing `total` score mode. (`sum` doesn't exist in Lucene's ScoreMode class)
* If `max_children` is set to `0` it now really means that zero children are allowed to match.
This add equals, hashcode, read/write methods, separates toQuery and JSON parsing and adds tests.
Also moving MatchQueryBuilder.Type to MatchQuery to MatchQuery, adding serialization and hashcode,
equals there.
Relates to #10217
Allocation filtering by IP only works today using the node host address. But in some cases, you might want to filter using the publish address which could be different.
This commit splits HasParentQueryParser into toQuery and fromXContent.
This change also deprecates several keys in favor of simplified settings
and adds basic unittests for HasParentQueryParser.
Relates to #10217
Previously the parser could take any Term Vectors request, but this would be
not the case of the builder which would still use MultiGetRequest.Item. This
introduces a new Item class which is used by both the builder and parser.
Beyond that the rest is mostly cleanups such as:
1) Deprecating the ignoreLike methods, in favor to using unlike.
2) Deprecating and renaming MoreLikeThisBuilder#addItem to addLikeItem.
3) Ordering the methods of MoreLikeThisBuilder more logically.
This change is needed for the upcoming query refactoring of MLT.
Closes#13372
This is an intial commit that splits HasChildQueryParser / Builder into
the two seperate steps. This one is particularly nasty since it transports
a pretty wild InnerHits object that needs heavy refactoring. Yet, this commit
has still some nocommits and needs more tests and maybe another cleanup but
it's a start to get the code out there.
This pipeline will calculate percentiles over a set of sibling buckets. This is an exact
implementation, meaning it needs to cache a copy of the series in memory and sort it to determine
the percentiles.
This comes with a few limitations: to prevent serializing data around, only the requested percentiles
are calculated (unlike the TDigest version, which allows the java API to ask for any percentile).
It also needs to store the data in-memory, resulting in some overhead if the requested series is
very large.
Until now we had a cloud-aws plugin which is providing 2 disctinct features:
* discovery on EC2
* snapshot/restore on S3
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.
The shaded version of elasticsearch was built at the very beginning to avoid dependency conflicts in a specific case where:
* People use elasticsearch from Java
* People needs to embed elasticsearch jar within their own application (as it's today the only way to get a `TransportClient`)
* People also embed in their application another (most of the time older) version of dependency we are using for elasticsearch, such as: Guava, Joda, Jackson...
This conflict issue can be solved within the projects themselves by either upgrade the dependency version and use the one provided by elasticsearch or by shading elasticsearch project and relocating some conflicting packages.
Example
-------
As an example, let's say you want to use within your project `Joda 2.1` but elasticsearch `2.0.0-beta1` provides `Joda 2.8`.
Let's say you also want to run all that with shield plugin.
Create a new maven project or module with:
```xml
<groupId>fr.pilato.elasticsearch.test</groupId>
<artifactId>es-shaded</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<elasticsearch.version>2.0.0-beta1</elasticsearch.version>
</properties>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.plugin</groupId>
<artifactId>shield</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
</dependencies>
```
And now shade and relocate all packages which conflicts with your own application:
```xml
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>org.joda</pattern>
<shadedPattern>fr.pilato.thirdparty.joda</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
```
You can create now a shaded version of elasticsearch + shield by running `mvn clean install`.
In your project, you can now depend on:
```xml
<dependency>
<groupId>fr.pilato.elasticsearch.test</groupId>
<artifactId>es-shaded</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.1</version>
</dependency>
```
Build then your TransportClient as usual:
```java
TransportClient client = TransportClient.builder()
.settings(Settings.builder()
.put("path.home", ".")
.put("shield.user", "username:password")
.put("plugin.types", "org.elasticsearch.shield.ShieldPlugin")
)
.build();
client.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("localhost", 9300)));
// Index some data
client.prepareIndex("test", "doc", "1").setSource("foo", "bar").setRefresh(true).get();
SearchResponse searchResponse = client.prepareSearch("test").get();
```
If you want to use your own version of Joda, then import for example `org.joda.time.DateTime`. If you want to access to the shaded version (not recommended though), import `fr.pilato.thirdparty.joda.time.DateTime`.
You can run a simple test to make sure that both classes can live together within the same JVM:
```java
CodeSource codeSource = new org.joda.time.DateTime().getClass().getProtectionDomain().getCodeSource();
System.out.println("unshaded = " + codeSource);
codeSource = new fr.pilato.thirdparty.joda.time.DateTime().getClass().getProtectionDomain().getCodeSource();
System.out.println("shaded = " + codeSource);
```
It will print:
```
unshaded = (file:/path/to/joda-time-2.1.jar <no signer certificates>)
shaded = (file:/path/to/es-shaded-1.0-SNAPSHOT.jar <no signer certificates>)
```
This PR also removes fully-loaded module.
By the way, the project can now build with Maven 3.3.3 so we can relax a bit our maven policy.
Prior to 2.0 we summed up the available space on all disk on a node
due to the raid-0 like behavior. Now we don't do this anymore and use the
min & max disk space to make decisions.
Closes#13106
detect_noop is pretty cheap and noop updates compartively expensive so this
feels like a sensible default.
Also had to do some testing and documentation around how _ttl works with
detect_noop.
Closes#11282
Until a couple of hours ago we expected the position_offset_gap to default
to 0 in 2.0 and 100 in 2.1. We decided it was worth backporting that new
default to 2.0. So now that its backported we need to teach 2.1 that 2.0
also defaults to 100.
Closes#7268
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.
Also postition_offset_gaps less than 0 aren't allowed at all.
New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through
Closes#7268
The setting `plugin.types` is currently used to load plugins from the
classpath. This is necessary in tests, as well as the transport client.
This change removes the setting, and replaces it with the ability to
directly add plugins when building a transport client, as well as
infrastructure in the integration tests to specify which plugin classes
should be loaded on each node.
Multicast has known issues (see #12999 and #12993). This change moves
multicast into a plugin, and deprecates it in the docs. It also allows
for plugging in multiple zen ping implementations.
closes#13019
Fix unicast discovery to work when a host has multiple addresses.
Ban dangerous methods in java.net with forbidden APIs.
Fix ipv6 bugs and formatting of network addresses everywhere.
Closes#12999Closes#12993
Squashed commit of the following:
commit 6c1aa001d091c5cf25212a53dc701fb704337f1e
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 14:25:43 2015 -0400
Fix these to be correct with addresses just in case
commit 648215627e84abf58a71400e7dc9ae775efb71d6
Merge: d00561b 41d8fbe
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 13:23:09 2015 -0400
Merge branch 'master' into unicast_all_the_way_down
commit d00561b76fd1aa5850699f7901f3dae3d4d402b7
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 16:38:50 2015 +0200
limit local ports to 5 in UnicastZenPing
commit e2e15c594006746cbe24432694294a71cc99deb8
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 10:32:47 2015 -0400
fix port limiting
commit 10153cb7adadda81a1f482445e703836b65cf5e2
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 10:18:37 2015 -0400
don't serialize scopeids: that's broken
commit 2aa63d43db2baec68a2e9bc227cfeb85dfeb4f83
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 16:06:51 2015 +0200
restore @Network
commit c840f1d1ef438826ae1ecfd5e45942a0e30dc9c0
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 16:02:30 2015 +0200
Use NetworkAddress.formatAddress where applicable in plugins
commit 374ce878852b35d626b7a29c8c4773545b0e9ddd
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 15:34:06 2015 +0200
Use NetworkAddress.formatAddress where applicable
commit e7a606d63f1bc43c1b62b6e17adf707c76d43a15
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 10:17:57 2015 +0200
Add @Multicast annotation to disable multicast tests by default.
We only run multicast tests now when we explicitly state it. A working
multicast env is required which is not always the case.
commit 2d7d2d0347179696ab41f71f048b13305014c85b
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 09:51:28 2015 +0200
Remove extra check for local mode in InternalTestCluster
commit dda59ac39aa136d4687b9274c2692cd77f8b8f66
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 09:37:03 2015 +0200
Handle node mode across entire test cluster
We used static methods reading sys properties to define the node mode
per cluster. this had lots of problems when tests couldn't cope with
mixed or only local mode. Now we are passing it down to the cluster from the test
which allows to @SuppressNetworkMode / @SupressLocalMode on the test to force
consistent node configurations.
commit 058197b7a408318995c88ce7f6762e32348de0de
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 03:19:14 2015 -0400
really ban InetSocketAddress's trappy method and break build and go to sleep, sorry
commit ac8779185aee1e17e6f5a81766290fdfc9c603ba
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 03:16:52 2015 -0400
Ban methods that might surprisingly cause DNS lookups
commit e64fe3dff2b11503e5f2831eb9863d64f56c5538
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 02:59:05 2015 -0400
Add unit test
commit f15434f20fb1a3691b1cc16028597d8fae937e05
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 02:39:02 2015 -0400
fix ipv6 formatting bugs
commit 05c2c74098052c75fbb79ea1818a295ef2e03e30
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 02:12:05 2015 -0400
format addresses correctly so I can actually read what comes out of our logs and stats apis
commit 4f9389dcf1e8925f23153c5eb271b4ce2294dbaf
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 21:26:52 2015 -0400
ban dangerous methods in java.net
commit 6aacd4d9925f324903d1d099a6cf5f862aeaf677
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 20:59:24 2015 -0400
ban lenient method
commit f466a842c60163d1f4554bdce8a4163edb534c2c
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 00:29:00 2015 +0200
fix tests to not mix local transport and zen unicast disco
commit 0de007a33b33fb68cf85cd86db4ca4f8ce10bbc9
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 00:10:07 2015 +0200
fix tests to not mix local transport and zen unicast disco
commit 539f6ca6e5137e0d496239adc8684688dedcc824
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 00:02:01 2015 +0200
fix tests to not mix local transport and zen unicast disco
commit 004c2881b25467f332acc8c9f9e92b1f0f9d314e
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 17:51:45 2015 -0400
Fix multinode
commit 54113af325ce31571811c49fdaae89d5687be4ba
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 17:36:45 2015 -0400
fix integration tests
commit 0156a77a56319d6b9737ec6a531992052e50bd59
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 23:32:18 2015 +0200
enable multicast in MulticastZenPingIT.java
commit 1791caa35da853ce0122485fa3fd4674c671ec6e
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 17:23:16 2015 -0400
Fix constant
commit 22820b53e0b2dc9fd47145c2bc29ce912a8fd484
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 22:59:09 2015 +0200
give it some extra ids for local transport crazyness
commit b2138fafa94a8a085813fd48356df63e57ade5b3
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 22:51:42 2015 +0200
pass on local addresses from configured transport rather than hard code IP addresses
commit 1bf5de1f457b081e0ce262b57d2b55d39c434156
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 22:04:31 2015 +0200
fix PluggableTransportModuleIT.java to use local disco and detach port limit for node local disco
commit b6706eddfa04c43947c16551359ae98a463d34aa
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 14:16:03 2015 -0400
Default to unicast discovery, with default host list of 127.0.0.1, [::1]
The documentation states that scrolls are automatically closed when all
documents are consumed, but this is not the case. I first tried to fix
the code to close scrolls automatically but this made REST tests fail
because clearing a scroll that is already closed returned a 4xx error
instead of a 2xx code, so this has probably been this way for a very long
time.
At the moment, when installing from an url, a user provides the plugin name on
the command line like:
* bin/plugin install [plugin-name] --url [url]
This can lead to problems when picking an already existing name from another
plugin, and can potentially overwrite plugins already installed with that name.
This, this PR introduces a mandatory `name` property to the plugin descriptor
file which replaces the name formerly provided by the user.
With the addition of the `name` property to the plugin descriptor file, the user
does not need to specify the plugin name any longer when installing from a file
or url. Because of this, all arguments to `plugin install` command are now
either treated as a symbolic name, a URL or a file without the need to specify
this with an explicit option.
The new syntax for `plugin install` is now:
bin/plugin install [name or url]
* downloads official plugin
bin/plugin install analysis-kuromoji
* downloads github plugin
bin/plugin install lmenezes/elasticsearch-kopf
* install from URL or file
bin/plugin install http://link.to/foo.zip
bin/plugin install file:/path/to/foo.zip
If the argument does not parse to a valid URL, it is assumed to be a name and the
download location is resolved like before. Regardless of the source location of
the plugin, it is extracted to a temporary directory and the `name` property from
the descriptor file is used to determine the final install location.
Relates to #12715