This pipeline will calculate percentiles over a set of sibling buckets. This is an exact
implementation, meaning it needs to cache a copy of the series in memory and sort it to determine
the percentiles.
This comes with a few limitations: to prevent serializing data around, only the requested percentiles
are calculated (unlike the TDigest version, which allows the java API to ask for any percentile).
It also needs to store the data in-memory, resulting in some overhead if the requested series is
very large.
Until now we had a cloud-aws plugin which is providing 2 disctinct features:
* discovery on EC2
* snapshot/restore on S3
This commit splits the plugin by feature so people can use either one or the other or both features.
Doc is updated accordingly.
The shaded version of elasticsearch was built at the very beginning to avoid dependency conflicts in a specific case where:
* People use elasticsearch from Java
* People needs to embed elasticsearch jar within their own application (as it's today the only way to get a `TransportClient`)
* People also embed in their application another (most of the time older) version of dependency we are using for elasticsearch, such as: Guava, Joda, Jackson...
This conflict issue can be solved within the projects themselves by either upgrade the dependency version and use the one provided by elasticsearch or by shading elasticsearch project and relocating some conflicting packages.
Example
-------
As an example, let's say you want to use within your project `Joda 2.1` but elasticsearch `2.0.0-beta1` provides `Joda 2.8`.
Let's say you also want to run all that with shield plugin.
Create a new maven project or module with:
```xml
<groupId>fr.pilato.elasticsearch.test</groupId>
<artifactId>es-shaded</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<elasticsearch.version>2.0.0-beta1</elasticsearch.version>
</properties>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.plugin</groupId>
<artifactId>shield</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
</dependencies>
```
And now shade and relocate all packages which conflicts with your own application:
```xml
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>org.joda</pattern>
<shadedPattern>fr.pilato.thirdparty.joda</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
```
You can create now a shaded version of elasticsearch + shield by running `mvn clean install`.
In your project, you can now depend on:
```xml
<dependency>
<groupId>fr.pilato.elasticsearch.test</groupId>
<artifactId>es-shaded</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.1</version>
</dependency>
```
Build then your TransportClient as usual:
```java
TransportClient client = TransportClient.builder()
.settings(Settings.builder()
.put("path.home", ".")
.put("shield.user", "username:password")
.put("plugin.types", "org.elasticsearch.shield.ShieldPlugin")
)
.build();
client.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("localhost", 9300)));
// Index some data
client.prepareIndex("test", "doc", "1").setSource("foo", "bar").setRefresh(true).get();
SearchResponse searchResponse = client.prepareSearch("test").get();
```
If you want to use your own version of Joda, then import for example `org.joda.time.DateTime`. If you want to access to the shaded version (not recommended though), import `fr.pilato.thirdparty.joda.time.DateTime`.
You can run a simple test to make sure that both classes can live together within the same JVM:
```java
CodeSource codeSource = new org.joda.time.DateTime().getClass().getProtectionDomain().getCodeSource();
System.out.println("unshaded = " + codeSource);
codeSource = new fr.pilato.thirdparty.joda.time.DateTime().getClass().getProtectionDomain().getCodeSource();
System.out.println("shaded = " + codeSource);
```
It will print:
```
unshaded = (file:/path/to/joda-time-2.1.jar <no signer certificates>)
shaded = (file:/path/to/es-shaded-1.0-SNAPSHOT.jar <no signer certificates>)
```
This PR also removes fully-loaded module.
By the way, the project can now build with Maven 3.3.3 so we can relax a bit our maven policy.
Prior to 2.0 we summed up the available space on all disk on a node
due to the raid-0 like behavior. Now we don't do this anymore and use the
min & max disk space to make decisions.
Closes#13106
detect_noop is pretty cheap and noop updates compartively expensive so this
feels like a sensible default.
Also had to do some testing and documentation around how _ttl works with
detect_noop.
Closes#11282
Until a couple of hours ago we expected the position_offset_gap to default
to 0 in 2.0 and 100 in 2.1. We decided it was worth backporting that new
default to 2.0. So now that its backported we need to teach 2.1 that 2.0
also defaults to 100.
Closes#7268
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.
Also postition_offset_gaps less than 0 aren't allowed at all.
New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through
Closes#7268
The setting `plugin.types` is currently used to load plugins from the
classpath. This is necessary in tests, as well as the transport client.
This change removes the setting, and replaces it with the ability to
directly add plugins when building a transport client, as well as
infrastructure in the integration tests to specify which plugin classes
should be loaded on each node.
Multicast has known issues (see #12999 and #12993). This change moves
multicast into a plugin, and deprecates it in the docs. It also allows
for plugging in multiple zen ping implementations.
closes#13019
Fix unicast discovery to work when a host has multiple addresses.
Ban dangerous methods in java.net with forbidden APIs.
Fix ipv6 bugs and formatting of network addresses everywhere.
Closes#12999Closes#12993
Squashed commit of the following:
commit 6c1aa001d091c5cf25212a53dc701fb704337f1e
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 14:25:43 2015 -0400
Fix these to be correct with addresses just in case
commit 648215627e84abf58a71400e7dc9ae775efb71d6
Merge: d00561b 41d8fbe
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 13:23:09 2015 -0400
Merge branch 'master' into unicast_all_the_way_down
commit d00561b76fd1aa5850699f7901f3dae3d4d402b7
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 16:38:50 2015 +0200
limit local ports to 5 in UnicastZenPing
commit e2e15c594006746cbe24432694294a71cc99deb8
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 10:32:47 2015 -0400
fix port limiting
commit 10153cb7adadda81a1f482445e703836b65cf5e2
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 10:18:37 2015 -0400
don't serialize scopeids: that's broken
commit 2aa63d43db2baec68a2e9bc227cfeb85dfeb4f83
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 16:06:51 2015 +0200
restore @Network
commit c840f1d1ef438826ae1ecfd5e45942a0e30dc9c0
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 16:02:30 2015 +0200
Use NetworkAddress.formatAddress where applicable in plugins
commit 374ce878852b35d626b7a29c8c4773545b0e9ddd
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 15:34:06 2015 +0200
Use NetworkAddress.formatAddress where applicable
commit e7a606d63f1bc43c1b62b6e17adf707c76d43a15
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 10:17:57 2015 +0200
Add @Multicast annotation to disable multicast tests by default.
We only run multicast tests now when we explicitly state it. A working
multicast env is required which is not always the case.
commit 2d7d2d0347179696ab41f71f048b13305014c85b
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 09:51:28 2015 +0200
Remove extra check for local mode in InternalTestCluster
commit dda59ac39aa136d4687b9274c2692cd77f8b8f66
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 09:37:03 2015 +0200
Handle node mode across entire test cluster
We used static methods reading sys properties to define the node mode
per cluster. this had lots of problems when tests couldn't cope with
mixed or only local mode. Now we are passing it down to the cluster from the test
which allows to @SuppressNetworkMode / @SupressLocalMode on the test to force
consistent node configurations.
commit 058197b7a408318995c88ce7f6762e32348de0de
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 03:19:14 2015 -0400
really ban InetSocketAddress's trappy method and break build and go to sleep, sorry
commit ac8779185aee1e17e6f5a81766290fdfc9c603ba
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 03:16:52 2015 -0400
Ban methods that might surprisingly cause DNS lookups
commit e64fe3dff2b11503e5f2831eb9863d64f56c5538
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 02:59:05 2015 -0400
Add unit test
commit f15434f20fb1a3691b1cc16028597d8fae937e05
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 02:39:02 2015 -0400
fix ipv6 formatting bugs
commit 05c2c74098052c75fbb79ea1818a295ef2e03e30
Author: Robert Muir <rmuir@apache.org>
Date: Thu Aug 20 02:12:05 2015 -0400
format addresses correctly so I can actually read what comes out of our logs and stats apis
commit 4f9389dcf1e8925f23153c5eb271b4ce2294dbaf
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 21:26:52 2015 -0400
ban dangerous methods in java.net
commit 6aacd4d9925f324903d1d099a6cf5f862aeaf677
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 20:59:24 2015 -0400
ban lenient method
commit f466a842c60163d1f4554bdce8a4163edb534c2c
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 00:29:00 2015 +0200
fix tests to not mix local transport and zen unicast disco
commit 0de007a33b33fb68cf85cd86db4ca4f8ce10bbc9
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 00:10:07 2015 +0200
fix tests to not mix local transport and zen unicast disco
commit 539f6ca6e5137e0d496239adc8684688dedcc824
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Aug 20 00:02:01 2015 +0200
fix tests to not mix local transport and zen unicast disco
commit 004c2881b25467f332acc8c9f9e92b1f0f9d314e
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 17:51:45 2015 -0400
Fix multinode
commit 54113af325ce31571811c49fdaae89d5687be4ba
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 17:36:45 2015 -0400
fix integration tests
commit 0156a77a56319d6b9737ec6a531992052e50bd59
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 23:32:18 2015 +0200
enable multicast in MulticastZenPingIT.java
commit 1791caa35da853ce0122485fa3fd4674c671ec6e
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 17:23:16 2015 -0400
Fix constant
commit 22820b53e0b2dc9fd47145c2bc29ce912a8fd484
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 22:59:09 2015 +0200
give it some extra ids for local transport crazyness
commit b2138fafa94a8a085813fd48356df63e57ade5b3
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 22:51:42 2015 +0200
pass on local addresses from configured transport rather than hard code IP addresses
commit 1bf5de1f457b081e0ce262b57d2b55d39c434156
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Aug 19 22:04:31 2015 +0200
fix PluggableTransportModuleIT.java to use local disco and detach port limit for node local disco
commit b6706eddfa04c43947c16551359ae98a463d34aa
Author: Robert Muir <rmuir@apache.org>
Date: Wed Aug 19 14:16:03 2015 -0400
Default to unicast discovery, with default host list of 127.0.0.1, [::1]
The documentation states that scrolls are automatically closed when all
documents are consumed, but this is not the case. I first tried to fix
the code to close scrolls automatically but this made REST tests fail
because clearing a scroll that is already closed returned a 4xx error
instead of a 2xx code, so this has probably been this way for a very long
time.
At the moment, when installing from an url, a user provides the plugin name on
the command line like:
* bin/plugin install [plugin-name] --url [url]
This can lead to problems when picking an already existing name from another
plugin, and can potentially overwrite plugins already installed with that name.
This, this PR introduces a mandatory `name` property to the plugin descriptor
file which replaces the name formerly provided by the user.
With the addition of the `name` property to the plugin descriptor file, the user
does not need to specify the plugin name any longer when installing from a file
or url. Because of this, all arguments to `plugin install` command are now
either treated as a symbolic name, a URL or a file without the need to specify
this with an explicit option.
The new syntax for `plugin install` is now:
bin/plugin install [name or url]
* downloads official plugin
bin/plugin install analysis-kuromoji
* downloads github plugin
bin/plugin install lmenezes/elasticsearch-kopf
* install from URL or file
bin/plugin install http://link.to/foo.zip
bin/plugin install file:/path/to/foo.zip
If the argument does not parse to a valid URL, it is assumed to be a name and the
download location is resolved like before. Regardless of the source location of
the plugin, it is extracted to a temporary directory and the `name` property from
the descriptor file is used to determine the final install location.
Relates to #12715
This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its
defaults so that values will not be indexed by default, as the only purpose
of this field is to speed up `cardinality` aggregations on high-cardinality
string fields, which only requires doc values.
I also removed the `rehash` option from the `cardinality` aggregation as it
doesn't bring much value (rehashing is cheap) and allowed to remove the
coupling between the `cardinality` aggregation and the `murmur3` field.
Close#12874
When elasticsearch is configured by interface (or default: loopback interfaces),
bind to all addresses on the interface rather than an arbitrary one.
If the publish address is not specified, default it from the bound addresses
based on the following sort ordering:
* ipv4/ipv6 (java.net.preferIPv4Stack, defaults to true)
* ordinary addresses
* site-local addresses
* link local addresses
* loopback addresses
One one address is published, and multicast is still always over ipv4: these
need to be future improvements.
Closes#12906Closes#12915
Squashed commit of the following:
commit 7e60833312f329a5749f9a256b9c1331a956d98f
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 14:45:33 2015 -0400
fix java 7 compilation oops
commit c7b9f3a42058beb061b05c6dd67fd91477fd258a
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 14:24:16 2015 -0400
Cleanup/fix logic around custom resolvers
commit bd7065f1936e14a29c9eb8fe4ecab0ce512ac08e
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 13:29:42 2015 -0400
Add some unit tests for utility methods
commit 0faf71cb0ee9a45462d58af3d1bf214e8a79347c
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 12:11:48 2015 -0400
localhost all the way down
commit e198bb2bc0d1673288b96e07e6e6ad842179978c
Merge: b55d092 b93a75f
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 12:05:02 2015 -0400
Merge branch 'master' into network_cleanup
commit b55d092811d7832bae579c5586e171e9cc1ebe9d
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 12:03:03 2015 -0400
fix docs, fix another bug in multicast (publish host = bad here!)
commit 88c462eb302b30a82585f95413927a5cbb7d54c4
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 11:50:49 2015 -0400
remove nocommit
commit 89547d7b10d68b23d7f24362e1f4782f5e1ca03c
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 11:49:35 2015 -0400
fix http too
commit 9b9413aca8a3f6397b5031831f910791b685e5be
Author: Robert Muir <rmuir@apache.org>
Date: Mon Aug 17 11:06:02 2015 -0400
Fix transport / interface code
Next up: multicast and then http
* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community
Closes#11734Closes#11724Closes#11636Closes#11635Closes#11632Closes#11630Closes#12046Closes#12438Closes#12579
This allows `path.shared_data` to be added to the security manager while
still allowing a custom `data_path` for indices using shadow replicas.
For example, configuring `path.shared_data: /tmp/foo`, then created an
index with:
```
POST /myindex
{
"index": {
"number_of_shards": 1,
"number_of_replicas": 1,
"data_path": "/tmp/foo/bar/baz",
"shadow_replicas": true
}
}
```
The index will then reside in `/tmp/foo/bar/baz`.
`path.shared_data` defaults to `null` if not specified.
Resolves#12714
Relates to #11065
Instead of logging the entire `_source` in the indexing slowlog we log by
default just the first 1000 characters - this is controlled by the
`index.indexing.slowlog.source` settings and can be set to `true` to log the
whole `_source`, `false` to log none of it, and a number to log at most that
many characters.
Closes#4485
Due to the limited abilities of parsing of dynamic (not configured) arguments
like `http.cors.enabled`, that dont map to a command line argument but will
become configuration, we need to mention explicitely, that those dynamic arguments
must come last.
Also fixed some mentions of a memory index setting, that does not exist anymore.
Closes#12758
This commit adds basic support to track the number of times scripts are
compiled and compiled scripts are evicted from the script cache. These
statistics are tracked at the node level.
Closes#12673
This commit updates the Zen Discovery documentation to explain which
nodes partcipate in master election (by default) as well as the
configuration parameters for controlling this.
Closes#12727
The `multi_match` query groups terms that have the same analyzer together and
then applies the boost of the first query in each group. This is not necessary
given that boosts for each term are already applied another way.
Today this is "unofficial" as conf/scripts, but some people
want to share scripts across different nodes and so on. Because
they cannot configure it, they are forced to use dirty hacks
like symbolic links, which isnt going to work: we aren't going
to recursively scan conf/ and add permissions to all link targets
underneath it, thats crazy.
I really hate adding yet another configuration knob here, but
users resorting to using symlinks are going to be frustrated,
and do things in a more insecure way.
In order to ensure, we have the same experience across operating systems
and shells, this commit uses the java CLI parser instead of the shell
getopt parsing to parse arguments.
This also allows for support for paths, which contain spaces.
Also commons-cli depdency was upgraded to 1.3.1 and tests have been added.
Changes
* new exit code, OK_AND_EXIT, allowing to tell the caller to exit, as everything
went as expected (e.g. when running a version output)
BWC breaking:
* execute() returns an ExitStatus instead of an integer, otherwise there is no
possibility to signal by a command, if the JVM should be exited after a run.
This affects plugins, that have command line tools
* -v used to be version, but is a verbose flag by default in the current CLI infra,
must be -V or --version now
* -X has been removed - the current implementation was useless anyway, as
it prefixed those properties with "es.". You should use
ES_JAVA_OPTS/JAVA_OPTS for JVM configuration
Date math index name resolution enables you to search a range of time-series indices, rather than searching all of your time-series indices and filtering the the results or maintaining aliases. Limiting the number of indices that are searched reduces the load on the cluster and improves execution performance. For example, if you are searching for errors in your daily logs, you can use a date math name template to restrict the search to the past two days.
The added `ExpressionResolver` implementation that is responsible for resolving date math expressions in index names. This resolver is evaluated before wildcard expressions are evaluated.
The supported format: `<static_name{date_math_expr{date_format|timezone_id}}>` and the date math expressions must be enclosed within angle brackets. The `date_format` is optional and defaults to `YYYY.MM.dd`. The `timezone_id` id is optional too and defaults to `utc`.
The `{` character can be escaped by places `\\` before it.
Closes#12059
HDRHistogram has been added as an option in the percentiles and percentile_ranks aggregation. It has one option `number_significant_digits` which controls the accuracy and memory size for the algorithm
Closes#8324
The TransportSingleCustomOperationAction `prefer_local` option has been removed as it isn't worth the effort.
The TransportSingleShardAction will execute the operation on the receiving node if a concrete list doesn't provide a list of candite shards routings to perform the operation on.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
Moving the query building functionality from the parser to the builders
new doToQuery() method analogous to other recent query refactorings.
Relates to #10217Closes#12365
The release and smoke test python scripts used to install
plugins in the old fashion.
Also the BATS testing suite installed/removed plugins in that
way. Here the marvel tests have been removed, as marvel currently
does not work with the master branch.
In addition documentation has been updated as well, where it was
still missing.
In order to unify the handling and reuse the CLITool infrastructure
the plugin manager should make use of this as well.
This obsolets the -i and --install options but requires the user
to use `install` as the first argument of the CLI.
This is basically just a port of the existing functionality, which
is also the reason why this is not a refactoring of the plugin manager,
which will come in a separate commit.
The `_index` field is now a completely virtual field thanks
to #12027. It is no longer necessary to index the actual value
of the index name.
closes#12329