There have been security issues with tika's parsers in the past...
let's take away the network, filesystem, everything we can.
In some way, parsing these docs is a lot like executing untrusted code.
I know its not pretty, but I think its worth it.
This patch adds a zip of about 200 files from tika's test suite,
and we assert some content comes back from each. This is a good exercise
of the various formats.
I removed any huge files to try to keep size reasonable, but we want
a bit of a variety so we know stuff is working.
I fixed issues with the parser config by running this.
this removes a lot of obscure parsers, and leaves us with the basics.
This includes at least all of the formats listed on
https://github.com/elastic/elasticsearch-mapper-attachments/issues/163
I will start adding tests for each one of these document formats,
and take it as it goes and see what trouble we run into.
Closes#163
The plugin name currently defaults to the gradle project name. But the
gradle project name for standalone repo (like an external plugin would
be) defaults to the directory name of the repo. This is trappy, since it
depends on how the repo was checked out.
This change enforces the plugin name is always set.
closes#14603
The completion suggester provides auto-complete/search-as-you-type functionality.
This is a navigational feature to guide users to relevant results as they are typing, improving search precision.
It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters.
The completions are indexed as a weighted FST (finite state transducer) to provide fast Top N prefix-based
searches suitable for serving relevant results as a user types.
closes#10746
Many of the tests were not running, or did not check the exceptions.
I renamed all tests to meet *Tests* so they run, and assert exception messages.
Also because we must (currently) invoke tika with additional privileges, I added
the security logic, and fixed unit testing to call our static method directly.
This must be package private for security reasons, i simply put everything in
org.elasticsearch.mapper.attachments package.
I upgraded tika to the latest, so we are up to date, and removed logic around
tika == null and old locale issues.
This makes it a groovy project that works in eclipse.
You will have to install a plugin for groovy language support
(I used a snapshot build from https://github.com/groovy/groovy-eclipse/wiki)
Random code shouldn't be listening on sockets elsewhere.
Today its the wild west, but we only need to grant access to what the user configured.
This means e.g. multicast plugin has to declare its intentions in its security.policy
Closes#14549
Many other improvements:
* Use spaces in ES path
* Use space in path for plugin file installation
* Use a different cwd than ES home
* Use jps to ensure process being stopped is actually elasticsearch
* Stop ES if pid file already exists
* Delete pid file when successfully killed
Also, refactored the cluster formation code to be a little more organized.
closes#14464
Closes#14595
Squashed commit of the following:
commit d0b2b262e9dcdbc2aee163b9a84db082c8b5b96b
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 6 22:36:54 2015 -0500
Switch to JarInputStream, to contain suppressforbidden. Also add a test that fails if path is not accessible (regardless of whether its a jar)
commit f99c1d240db23ceb2a06987b3bd69eae0229550b
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 6 22:16:16 2015 -0500
remove leniency in i/o here
commit b160d4303ee81a8c9298729596ecbc893f5f8894
Author: Robert Muir <rmuir@apache.org>
Date: Fri Nov 6 21:58:21 2015 -0500
Fix Build to correctly treat URLs and to not leak a file handle
Eclipse does not have the ability to differentiate test dependencies
from main dependencies. This causes what looks like a circular
dependency through test-framework. This change sets up an additional
core-tests project for eclipse only, which removes this problem.
This commit prevents running rebalance operations if the store allocator is
still fetching async shard / store data to prevent pre-mature rebalance decisions
which need to be reverted once shard store data is available. This is typically happening
on rolling restarts which can make those restarts extremely painful.
Closes#14387
This commit adds the abstraction layer to GeoPointFieldMapper needed to cut over to Lucene 5.4's new GeoPointField type while maintaining backward compatibility with 'legacy' geo_point indexes.
This commit addresses an issue in TransportBroadcastByNodeAction.
Namely, if in between a master leaving the cluster and a new one being
assigned, a request that relies on TransportBroadcastByNodeAction
(e.g., an indices stats request) is issued,
TransportBroadcastByNodeAction might attempt to send a request to a
node that is no longer in the local node’s cluster state.
The exact circumstances that lead to this are as follows. When the
master leaves the cluster and another node’s master fault detection
detects this, the local node will update its local cluster state to no
longer include the master node. However, the routing table will not be
updated. This means that in the preparation for sending the requests in
TransportBroadcastByNodeAction, we need to check that not only is a
shard assigned, but also that it is assigned to a node that is still in
the local node’s cluster state.
This commit adds such a check to the constructor of
TransportBroadcastByNodeAction. A new unit test is added that checks
that no request is sent to the master node in such a situation; this
test fails with a NullPointerException without the fix. Additionally,
the unit test TransportBroadcastByNodeActionTests#testResultAggregation
is updated to also simulate a master failure. This updated test also
fails prior to the fix.
Closes#14584