Commit Graph

1038 Commits

Author SHA1 Message Date
Robert Muir 0ed45c5bfb remove filesystem leniency 2015-12-21 14:16:53 -05:00
Robert Muir deaf8884e9 Fix exc handling 2015-12-21 13:04:22 -05:00
Robert Muir 3ffd1a5219 final 2015-12-21 12:54:33 -05:00
Robert Muir f81b12e327 minimize accessiblity, remove unused threadpool 2015-12-21 12:39:40 -05:00
Adrien Grand cf52e96c42 Upgrade to lucene-5.5.0-snapshot-1721183.
Some files that implement or use the Scorer API had to be changed because of
https://issues.apache.org/jira/browse/LUCENE-6919.
2015-12-21 17:02:08 +01:00
Adrien Grand ac393b7a31 Make mappings tests more realistic.
DocumentMapperParser has both parse and parseCompressed methods. Except that the
parse methods are ONLY used from the unit tests. This commit removes the parse
method and moves all tests to parseCompressed so that they test more
realistically how mappings are managed.

Then I renamed parseCompressed to parse given that this is the only alternative
anyway.
2015-12-21 10:44:00 +01:00
Robert Muir f67390e0c8 in the plugin: guard against HADOOP_HOME in environment on any platform.
hdfs fixture: minihdfs works on windows now, if things are properly set
but our test fixture still cannot launch this on windows.
2015-12-21 02:21:53 -05:00
Robert Muir 53530f1243 remove hacks, test fixtures are clean before each execution 2015-12-20 22:23:30 -05:00
Robert Muir 935c2c75f6 Remove slf4j hack 2015-12-20 22:08:18 -05:00
Robert Muir 04966bcc3e contain and improve hack 2015-12-20 21:02:03 -05:00
Robert Muir 03a2b6b01b Disable HDFS fixture on windows, it requires native libraries. 2015-12-20 16:30:19 -08:00
Robert Muir a37417085d blind stab at unit test issues on windows 2015-12-20 18:31:55 -05:00
Robert Muir ee546ff655 try to get windows working 2015-12-20 17:10:01 -05:00
Robert Muir 2347e3c373 Get forbidden apis passing again, this needs to be investigated 2015-12-20 16:17:17 -05:00
Robert Muir 7ac49bb278 Merge branch 'hdfs2-only' of github.com:costin/elasticsearch into hdfs2-only 2015-12-20 16:12:23 -05:00
Robert Muir 12a8428dfb Add MiniHDFS test fixture, started before integTest and shut down after.
Currently uses a hardcoded port (9999), need to apply MavenFilteringHack after it starts.
2015-12-20 16:00:37 -05:00
Martijn van Groningen 66d8b5342f s/PipelineStoreBootstrapper/IngestBootstrapper 2015-12-20 20:18:46 +01:00
Costin Leau 3204e87220 Restrict usage to HDFS only 2015-12-20 15:53:18 +02:00
Robert Muir ae89c6e51c Merge branch 'master' into hdfs2-only 2015-12-19 21:52:52 -05:00
Ryan Ernst 9cb4c82c58 Build: Add fixture capabilities to integ tests
This change adds a Fixture class for use by gradle. A Fixture is an
external process that integration tests will use. It can be added as a
dependsOn for integTest, and will automatically be shutdown upon success
or failure, as well as relevant information dumped on failure. There is
also an example fixture in this change.
2015-12-19 15:46:21 -08:00
Robert Muir 8c6f5a0c60 add failing test 2015-12-19 15:05:38 -08:00
Robert Muir 5dcccca848 add example fixture 2015-12-19 15:05:37 -08:00
Robert Muir d171773bdb remove leniency in tests 2015-12-19 04:39:01 -05:00
Robert Muir e2b2ee24fa Add licensing for dependencies 2015-12-19 03:06:40 -05:00
Robert Muir 9df447295c Fix unit tests (also works from IDE). 2015-12-19 02:43:27 -05:00
Robert Muir 3269beeb4d don't throw exceptions from ctor, guice is hell 2015-12-19 02:09:14 -05:00
Robert Muir f174e96a14 explicitly initialize some hadoop classes elevated, so we don't rely on classloading order.
maybe this allows us to do less stuff in doPriv later, we will see. at least it makes things
like unit testing easier.
2015-12-19 00:21:01 -05:00
Robert Muir 2e8c68d09b Remove no-longer needed domaincombiner stuff 2015-12-18 23:51:41 -05:00
Robert Muir 02fbd55118 enable thirdPartyAudit so you can see the crazy shit hadoop does 2015-12-18 23:45:05 -05:00
Robert Muir bc11962438 get full snapshot restore tests passing 2015-12-18 23:16:41 -05:00
Robert Muir fbe3d64ea4 add passing test that takes snapshot 2015-12-18 22:55:15 -05:00
Robert Muir 75ef9da53f get up to connectexception 2015-12-18 22:11:58 -05:00
Ryan Ernst c2c5081830 Remove uneeded class loading stuff from hdfs plugin 2015-12-18 17:01:38 -08:00
Ryan Ernst 91fe99a7f6 Make hdfs plugin not use transitive deps 2015-12-18 16:52:22 -08:00
Costin Leau 7584810ff4 * Make plugin hadoop2-only
Polish MiniDFS cluster to be Hadoop2 (instead of Hadoop1) based
2015-12-19 01:35:53 +02:00
Ryan Ernst 690fb2cd3f Rename InternalFilters.Bucket to InternalFilters.InternalBucket to avoid name collision 2015-12-18 13:22:20 -08:00
Ryan Ernst 4ea19995cf Remove wildcard imports 2015-12-18 12:43:47 -08:00
Robert Muir 447729f0e1 add missing license headers 2015-12-18 13:08:17 -05:00
Robert Muir 2e2e328879 add missing license header 2015-12-18 13:02:39 -05:00
Martijn van Groningen 6dfcee6937 Added an internal reload pipeline api that makes sure pipeline changes are visible on all ingest nodes after modifcations have been made.
* Pipeline store can now only start when there is no .ingest index or all primary shards of .ingest have been started
* IngestPlugin adds`node.ingest` setting to `true`. This is used to figure out to what nodes to send the refresh request too. This setting isn't yet configurable. This will be done in a follow up issue.
* Removed the background pipeline updater and added added logic to deal with specific scenarious to reload all pipelines.
* Ingest services are no longer be managed by Guice. Only the bootstrapper gets managed by guice and that contructs
all the services/components ingest will need.
2015-12-18 18:24:27 +01:00
Martijn van Groningen e8a8e22e09 Add template infrastructure, removed meta processor and added template support to set and remove processor.
Added ingest wide template infrastructure to IngestDocument
Added a TemplateService interface that the ingest framework uses
Added a TemplateService implementation that the ingest plugin provides that delegates to the ES' script service
Cut SetProcessor over to use the template infrastructure for the `field` and `value` settings.
Removed the MetaDataProcessor
Removed dependency on mustache library
Added qa  ingest mustache rest test so that the ingest and mustache integration can be tested.
2015-12-18 17:35:53 +01:00
Martijn van Groningen a56902567e don't register rest actions on transport clients 2015-12-18 15:49:00 +01:00
Martijn van Groningen 2c5bb84851 fix copyDefaultGeoIp2DatabaseFiles task to work again 2015-12-18 15:45:56 +01:00
javanna 3e155f7b54 adapt to upstream changes: thirdPartyAudit.missingClasses set to true
geoip depends on asm and google http client which we don't need
2015-12-18 11:14:39 +01:00
javanna f349669071 adapt to upstream changes: enableMockModules => getMockPlugins 2015-12-18 11:13:34 +01:00
javanna 8bae93eee1 adapt to upstream changes: StringText => Text 2015-12-18 11:13:18 +01:00
javanna 885b01fb49 adapt to upstream changes: RestModule -> NetworkModule 2015-12-18 10:36:29 +01:00
javanna 5f2df6b95a Merge branch 'master' into feature/ingest 2015-12-18 10:34:07 +01:00
Simon Willnauer eca2435838 Merge branch 'master' into settings_prototype 2015-12-18 09:15:58 +01:00
Adrien Grand 6ea16671f4 Simplify the Text API.
We have the Text API, which is essentially a wrapper around a String and a
BytesReference and then we have 3 implementations depending on whether the
String view should be cached, the BytesReference view should be cached, or both
should be cached.

This commit merges everything into a single Text that is essentially the old
StringAndBytesText impl.

Long term we should look into whether this API has any performance benefit or
if we could just use plain strings. This would greatly simplify all our other
APIs that currently use Text.
2015-12-17 17:22:38 +01:00
Simon Willnauer eae3da3b54 Merge branch 'master' into settings_prototype 2015-12-17 15:13:41 +01:00
Adrien Grand 6ccc759691 Merge pull request #15480 from jpountz/fix/mapping_explicit_defaults
Make mapping serialization more robust.
2015-12-17 11:23:27 +01:00
Robert Muir a7cc91e868 Merge pull request #15501 from rmuir/sheisty_classes
thirdPartyAudit round 2
2015-12-17 03:44:27 -05:00
Robert Muir 6692e42d9a thirdPartyAudit round 2
This fixes the `lenient` parameter to be `missingClasses`. I will remove this boolean and we can handle them via the normal whitelist.
It also adds a check for sheisty classes (jar hell with the jdk).
This is inspired by the lucene "sheisty" classes check, but it has false positives. This check is more evil, it validates every class file against the extension classloader as a resource, to see if it exists there. If so: jar hell.

This jar hell is a problem for several reasons:

1. causes insanely-hard-to-debug problems (like bugs in forbidden-apis)
2. hides problems (like internal api access)
3. the code you think is executing, is not really executing
4. security permissions are not what you think they are
5. brings in unnecessary dependencies
6. its jar hell

The more difficult problems are stuff like jython, where these classes are simply 'uberjared' directly in, so you cant just fix them by removing a bogus dependency. And there is a legit reason for them to do that, they want to support java 1.4.
2015-12-17 02:35:00 -05:00
Jack Conradson 4523eaec88 Added plumbing for compile time script parameters.
Closes #15464
2015-12-16 18:29:21 -08:00
Robert Muir 4f9d4103f2 Merge pull request #15491 from rmuir/forbidden_third_party
Add gradle thirdPartyAudit to precommit tasks
2015-12-16 18:56:50 -05:00
Robert Muir 42138007db add some more comments about internal api usage 2015-12-16 18:56:02 -05:00
Robert Muir ee79d46583 Add gradle thirdPartyAudit to precommit tasks 2015-12-16 16:38:16 -05:00
Ryan Ernst a2b8f4b90a Merge pull request #15434 from rjernst/http_type
Expose http.type setting, and collapse al(most all) modules relating to transport/http
2015-12-16 11:54:30 -08:00
Simon Willnauer 71b204ea49 Merge branch 'master' into settings_prototype 2015-12-16 20:29:21 +01:00
Adrien Grand 8ac8c1f547 Make mapping serialization more robust.
When creating a metadata mapper for a new type, we reuse an existing
configuration from an existing type (if any) in order to avoid introducing
conflicts. However this field type that is provided is considered as both an
initial configuration and the default configuration. So at serialization time,
we might only serialize the difference between the current configuration and
this default configuration, which might be different to what is actually
considered the default configuration.

This does not cause bugs today because metadata mappers usually override the
toXContent method and compare the current field type with Defaults.FIELD_TYPE
instead of defaultFieldType() but I would still like to do this change to
avoid future bugs.
2015-12-16 16:08:45 +01:00
Martijn van Groningen 07951fc731 added comment why 'accessDeclaredMembers' permission is needed 2015-12-16 10:11:46 +01:00
Simon Willnauer 6ea266a89c Merge branch 'master' into settings_prototype 2015-12-15 16:33:01 +01:00
Adrien Grand d94bba2d9c Remove back compat for the `path` option.
The `path` option allowed to index/store a field `a.b.c` under just `c` when
set to `just_name`. This "feature" has been removed in 2.0 in favor of `copy_to`
so we can remove the back compat in 3.x.
2015-12-15 14:55:23 +01:00
Adrien Grand 50eeafa75c Make mappings immutable.
Today mappings are mutable because of two APIs:
 - Mapper.merge, which expects changes to be performed in-place
 - IncludeInAll, which allows to change whether values should be put in the
   `_all` field in place.

This commit changes both APIs to return a modified copy instead of modifying in
place so that mappings can be immutable. For now, only the type-level object is
immutable, but in the future we can imagine making them immutable at the
index-level so that mapping updates could be completely atomic at the index
level.

Close #9365
2015-12-15 10:20:28 +01:00
Martijn van Groningen e87709f593 fix ingest runner 2015-12-15 10:18:19 +01:00
Ryan Ernst 60d35c81af Plugins: Expose http.type setting, and collapse al(most all) modules relating to transport/http
This change adds back the http.type setting. It also cleans up all the
transport related guice code to be consolidated within the
NetworkModule (as transport and http related stuff is what and how ES
exposes over the network). The setter methods previously used by some
plugins to override eg the TransportService or HttpServerTransport are
removed, and those plugins should now register a custom implementation
of the class with a name and set that using the appropriate config
setting. Note that I think ActionModule should also be moved into here,
to sit along side the rest actions, but I left that for a followup.

closes #14148
2015-12-14 22:01:04 -08:00
Costin Leau 7bca97bba6 HDFS Snapshot/Restore plugin
Migrated from ES-Hadoop. Contains several improvements regarding:

* Security
Takes advantage of the pluggable security in ES 2.2 and uses that in order
to grant the necessary permissions to the Hadoop libs. It relies on a
dedicated DomainCombiner to grant permissions only when needed only to the
libraries installed in the plugin folder
Add security checks for SpecialPermission/scripting and provides out of
the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1)

* Testing
Uses a customized Local FS to perform actual integration testing of the
Hadoop stack (and thus to make sure the proper permissions and ACC blocks
are in place) however without requiring extra permissions for testing.
If needed, a MiniDFS cluster is provided (though it requires extra
permissions to bind ports)
Provides a RestIT test

* Build system
Picks the build system used in ES (still Gradle)
2015-12-14 21:50:09 +02:00
Martijn van Groningen aaacf096d2 Merge remote-tracking branch 'es/master' into feature/ingest 2015-12-14 11:57:10 +01:00
Jason Tedor 3383c24be0 Remove and forbid use of Collections#shuffle(List) and Random#<init>()
This commit removes and now forbids all uses of
Collections#shuffle(List) and Random#<init>() across the codebase. The
rationale for removing and forbidding these methods is to increase test
reproducibility. As these methods use non-reproducible seeds, production
code and tests that rely on these methods contribute to
non-reproducbility of tests.

Instead of Collections#shuffle(List) the method
Collections#shuffle(List, Random) can be used. All that is required then
is a reproducible source of randomness. Consequently, the utility class
Randomness has been added to assist in creating reproducible sources of
randomness.

Instead of Random#<init>(), Random#<init>(long) with a reproducible seed
or the aforementioned Randomess class can be used.

Closes #15287
2015-12-11 11:16:38 -05:00
Martijn van Groningen d38cccb8a1 Fix issues after merging in master 2015-12-11 14:51:45 +01:00
Martijn van Groningen 503a166b71 Merge remote-tracking branch 'es/master' into feature/ingest 2015-12-11 14:32:16 +01:00
Robert Muir 2741888498 Remove RuntimePermission("accessDeclaredMembers")
Upgrades lucene to 5.5.0-1719088, randomizedtesting to 2.3.2, and securemock to 1.2
2015-12-10 14:26:55 -05:00
Boaz Leskes fafeb3abdd Introduce a common base response class to all single doc write ops
IndexResponse, DeleteResponse and UpdateResponse share some logic. This can be unified to a single DocWriteResponse base class. On top, some replication actions are now not about write operations anymore. This commit renames ActionWriteResponse to ReplicationResponse

Last some toXContent is moved from the Rest layer to the actual response classes, for more code re-sharing.

Closes #15334
2015-12-10 15:14:02 +01:00
Jack Conradson da5b07ae13 Added a new scripting language (PlanA).
Closes #15136
2015-12-09 16:32:37 -08:00
David Pilato 414fccb7d1 Merge branch 'fix/15268-proxy-auth' 2015-12-09 23:21:00 +01:00
javanna a8382de09d add comment to clarifiy why metadata fields can be set all the time to IndexRequest in PipelineExecutionService 2015-12-09 18:52:43 +01:00
javanna a95f81c015 avoid stripping out _source prefix when in _ingest context 2015-12-09 18:42:20 +01:00
javanna 57d6971252 Streamline support for get/set/remove of metadata fields and ingest metadata fields
Unify metadata map and source, add also support for _ingest prefix. Depending on the prefix, either _source, nothing or _ingest, we will figure out which map to use for values retrieval, but also modifications.
2015-12-09 18:36:44 +01:00
javanna 744d2908a8 avoid null values in simulate serialization prototypes, use empty maps instead 2015-12-09 18:36:44 +01:00
javanna 6b7446beb9 Remove sourceModified flag from IngestDocument
If one is using the ingest plugin and providing a pipeline id with the request, the chance that the source is going to be modified is 99%. We shouldn't worry about keeping track of whether something changed. That seemed useful at first so we can save the resources for setting back the source (map to bytes) when not needed. Also, we are trying to unify metadata fields and source in the same map and that is going to complicate how we keep track of changes that happen in the source only. Best solution is to remove the flag.
2015-12-09 18:36:43 +01:00
javanna b0d7d604ff Add support for transient metadata to IngestDocument
IngestDocument now holds an additional map of transient metadata. The only field that gets added automatically is `timestamp`, which contains the timestamp of ingestion in ISO8601 format. In the future it will be possible to eventually add or modify these fields, which will not get indexed, but they will be available via templates to all of the processors.

Transient metadata will be visualized by the simulate api, although they will never get indexed. Moved WriteableIngestDocument to the simulate package as it's only used by simulate and it's now modelled for that specific usecase.

 Also taken the chance to remove one IngestDocument constructor used only for testing (accepting only a subset of es metadata fields). While doing that introduced some more randomizations to some existing processor tests.

Closes #15036
2015-12-09 18:36:01 +01:00
javanna 5bc1e46113 setFieldValue for list to replace when an index is specified
It used to do add instead, which is not consistent with the behaviour of set, which always replaces.
2015-12-09 18:28:07 +01:00
javanna 8240031216 Merge branch 'master' into feature/ingest 2015-12-09 18:14:32 +01:00
Simon Willnauer a49120bfc1 fix compilation 2015-12-09 12:26:28 +01:00
Simon Willnauer 85a1b54867 fix compilation 2015-12-09 11:41:14 +01:00
Martijn van Groningen 233de434a0 Merge pull request #15310 from martijnvg/ingest/stream_put_and_delete_responses
Streamline put & delete pipeline responses with index & delete responses
2015-12-09 11:30:57 +01:00
Simon Willnauer c9d7c92243 fold ClusterSettingsService into ClusterSettings 2015-12-09 09:57:39 +01:00
Britta Weber e0aa481bf5 Merge pull request #15213 from brwe/copy-to-in-multi-fields-exception
throw exception if a copy_to is within a multi field

Copy to within multi field is ignored from 2.0 on, see #10802.
Instead of just ignoring it, we should throw an exception if this
is found in the mapping when a mapping is added. For already
existing indices we should at least log a warning.
We remove the copy_to in any case.

related to #14946
2015-12-08 14:41:07 +01:00
Simon Willnauer fbbb04b87e Add infrastructure to transactionally apply and reset dynamic settings
This commit adds the infrastructure to make settings that are updateable
resetable and changes the application of updates to be transactional. This means
setting updates are either applied or not. If the application failes all values are rejected.

This initial commit converts all dynamic cluster settings to make use of the new infrastructure.
All cluster level dynamic settings are not resettable to their defaults or to the node level settings.
The infrastructure also allows to list default values and descriptions which is not fully implemented yet.

Values can be reset using a list of key or simple regular expressions. This has only been implemented on the java
layer yet. For instance to reset all recovery settings to their defaults a user can just specify `indices.recovery.*`.

This commit also adds strict settings validation, if a setting is unknown or if a setting can not be applied the entire
settings update request will fail.
2015-12-08 14:39:15 +01:00
Martijn van Groningen a2cda4e3f2 Streamline the put and delete pipelines responses with the index and delete response in core. 2015-12-08 14:01:28 +01:00
David Pilato 7dcb40bcac Add support for proxy authentication for s3 and ec2
When using S3 or EC2, it was possible to use a proxy to access EC2 or S3 API but username and password were not possible to be set.

This commit adds support for this. Also, to make all that consistent, proxy settings for both plugins have been renamed:

* from `cloud.aws.proxy_host` to `cloud.aws.proxy.host`
* from `cloud.aws.ec2.proxy_host` to `cloud.aws.ec2.proxy.host`
* from `cloud.aws.s3.proxy_host` to `cloud.aws.s3.proxy.host`
* from `cloud.aws.proxy_port` to `cloud.aws.proxy.port`
* from `cloud.aws.ec2.proxy_port` to `cloud.aws.ec2.proxy.port`
* from `cloud.aws.s3.proxy_port` to `cloud.aws.s3.proxy.port`

New settings are `proxy.username` and `proxy.password`.

```yml
cloud:
    aws:
        protocol: https
        proxy:
            host: proxy1.company.com
            port: 8083
            username: myself
            password: theBestPasswordEver!
```

You can also set different proxies for `ec2` and `s3`:

```yml
cloud:
    aws:
        s3:
            proxy:
                host: proxy1.company.com
                port: 8083
                username: myself1
                password: theBestPasswordEver1!
        ec2:
            proxy:
                host: proxy2.company.com
                port: 8083
                username: myself2
                password: theBestPasswordEver2!
```

Note that `password` is filtered with `SettingsFilter`.

We also fix a potential issue in S3 repository. We were supposed to accept key/secret either set under `cloud.aws` or `cloud.aws.s3` but the actual code never implemented that.

It was:

```java
account = settings.get("cloud.aws.access_key");
key = settings.get("cloud.aws.secret_key");
```

We replaced that by:

```java
String account = settings.get(CLOUD_S3.KEY, settings.get(CLOUD_AWS.KEY));
String key = settings.get(CLOUD_S3.SECRET, settings.get(CLOUD_AWS.SECRET));
```

Also, we extract all settings for S3 in `AwsS3Service` as it's already the case for `AwsEc2Service` class.

Closes #15268.
2015-12-07 23:10:54 +01:00
Tal Levy e6b287c083 Merge pull request #15133 from talevy/the_one_true_field
[Ingest] Have processors operate one field at time.
2015-12-07 08:31:08 -08:00
Tal Levy 45f48ac126 update all processors to only operate on one field at a time when possible 2015-12-07 08:30:00 -08:00
Martijn van Groningen 6062c4eac9 Merge branch 'master' into feature/ingest 2015-12-07 15:53:39 +01:00
Robert Muir 2169a123a5 Filter classes loaded by scripts
Since 2.2 we run all scripts with minimal privileges, similar to applets in your browser.
The problem is, they have unrestricted access to other things they can muck with (ES, JDK, whatever).
So they can still easily do tons of bad things

This PR restricts what classes scripts can load via the classloader mechanism, to make life more difficult.
The "standard" list was populated from the old list used for the groovy sandbox: though
a few more were needed for tests to pass (java.lang.String, java.util.Iterator, nothing scary there).

Additionally, each scripting engine typically needs permissions to some runtime stuff.
That is the downside of this "good old classloader" approach, but I like the transparency and simplicity,
and I don't want to waste my time with any feature provided by the engine itself for this, I don't trust them.

This is not perfect and the engines are not perfect but you gotta start somewhere. For expert users that
need to tweak the permissions, we already support that via the standard java security configuration files, the
specification is simple, supports wildcards, etc (though we do not use them ourselves).
2015-12-05 21:46:52 -05:00
Robert Muir 46377778a9 Merge branch 'master' into getClassLoader 2015-12-04 15:58:36 -05:00
Robert Muir b0c64910b0 ban RuntimePermission("getClassLoader")
this gives more isolation between modules and plugins.
2015-12-04 15:58:02 -05:00
Ryan Ernst 01d48e2062 Merge branch 'master' into jigsaw 2015-12-04 11:29:49 -08:00
David Pilato 619fb998e8 Update Azure Service Management API to 0.9.0
Azure team released new versions of their Java SDK.

According to https://github.com/Azure/azure-sdk-for-java/wiki/Azure-SDK-for-Java-Features, it comes with 2 versions.
We should at least update to `0.9.0` of V1 but also consider moving to the new APIs (V2).

This commit first updates to latest API V1.

```xml
<dependency>
    <groupId>com.microsoft.azure</groupId>
    <artifactId>azure-svc-mgmt-compute</artifactId>
    <version>0.9.0</version>
</dependency>
```

Closes #15209
2015-12-04 17:32:11 +01:00