This commit removes all index level compatibilty and upgrade paths for
pre 2.0 indices. This includes:
* Remove leftover from delete_by_query to replay translog records,
since in 3.x all pending delelte_by_query instances are applied on the
upgrade to 2.x we can remove the bwc layer now.
* Remove Elasticsearch090PostingsFormat - we maintained our own posting format
until 2.0 this is now removed since folks need to upgrade to 2.x first before going
to 3.0
* Remove BloomFilterPostingFormat - this was only used for ID fields in the Elasticsearch090PostingsFormat
* Remove upgrade methods to pre 2.0 translogs without checkpoints
In #13971, the RenderSearchTemplateAction was made a cluster level action but the client methods
were not moved from the IndicesAdminClient. This is an inconsistency and this change cleans up the
inconsistency so that the client methods are now part of the ClusterAdminClient since the action is now
considered a cluster level action.
This test had to be moved to lang-groovy when groovy has been made a plugin.
I refactored it a bit to use mock plugins instead so that groovy is not
necessary anymore and it can come back to core.
The RenderSearchTemplateAction is currently a "indices" action, but the action does not really apply to
indices, it is more of a general action that is used to validate the search template. With our current
categorization of actions, `cluster:` and `indices:`, `cluster:` is a more appropriate type as this action
is not associated with indices. The classes are also moved to a cluster package.
GeoPoint now has native support in StreamOutput/StreamInput
impementing Writable is not necessary. This also adds tests
for XContentBuilder rendering GeoPoint
This fixes license checks to apply to all java files under src/ as opposed to
only those in the org.elasticsearch package. It found some license headers that
had to be reformatted. I also added a missing license header to Nullability.java
however this has not be caught by the license checker since it ignores guice
files.
Relates to #13703
If the plugin manager cannot successfully install a plugin, ensure
that every directory is cleaned up again. This includes
plugins/foo
config/foo
bin/foo
Closes#12749
When shard becomes active again, immediately increase its indexing buffer instead of waiting for up to 30 seconds while indexing with a tiny (500 KB) indexing buffer.
This commit removes and now forbids all uses of
com.google.common.collect.EvictingQueue across the codebase. This is
one of the few remaining steps in the eventual removal of Guava as a
dependency.
Relates #13224
Long running TransportNodesAction requests can retain old cluster states in memory for much longer than needed. This can cause nodes with frequent cluster state updates and long running requests to run out of memory.
This commit removes and now forbids all uses of
com.google.common.net.InetAddresses across the codebase. This is one of
the few remaining steps in the eventual removal of Guava as a
dependency.
Relates #13224
Currently it's not possible to specify a timeout for nodes operations (such as node info, node stats, cluster stats and hot threads) via REST-based APIs.
When running in GCE platform, an instance has access to:
http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip
Which gives back the private IP address, for example `10.240.0.2`.
http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/externalIp
Gives back the public Ip address, for example `130.211.108.21`.
As we have for `ec2`, we can support new network host settings:
* `_gce:privateIp:X_`: The private IP address of the machine for a given network interface.
* `_gce:hostname_`: The hostname of the machine.
* `_gce_`: Same as `_gce:privateIp:0_` (recommended).
Closes#13605.
Closes#13590.
BTW resolveIfPossible now throws IOException so code is also updated for ec2 discovery and
some basic tests have been added.
We previously removed the ability to specify metadata fields inside
documents in #11074, but the backcompat left leniency that allowed this
to still occur. This change locks down parsing so any metadata field
found while parsing a document results in an exception. This only
affects 2.0+ indexes; backcompat is maintained.
closes#13740
ShadowEngine doesn't have a translog but instead throws an
UOE when it's requested. ShadowIndexShard should not try to pull
stats for the translog either and should return null instead.
Closes#12730
SimilarityModule was binding two different interfaces SimilaritySerivce and SimilarityLookupSerice.
Both used a class Similarities which was holding some default impls etc. Thit commit folds
all the logic into a rather simplified SimilarityService which has not construction time dependency to
any other service in the system anymore. It's soely constructued from custom similarities, the index name
and index settings and SimilarityModule is just trivial glue code with out much logic.
This also adds a simple unittest for basic test coverage of the service.
This commit removes and now forbids all uses of
com.google.common.collect.Iterators across the codebase. This is one of
the final steps in the eventual removal of Guava as a dependency.
Relates #13224
The geoPoint gets normalized while calling toQuery. That should happen on a copy of the point though, the state of the request should never change as part of toQuery execution. Also updated corresponding test to support point normalization.
Closes#13984
The `_create` API is handy way to specify an index operation should only be done if the document doesn't exist. This is currently implemented in explicit code paths all the way down to the engine. However, conceptually this is no different than any other versioned operation - instead of requiring a document is on a specific version, we require it to be deleted (or non-existent). This PR removes Engine.Create in favor of a slight extension in the VersionType logic.
There are however a couple of side effects:
- DocumentAlreadyExistsException is removed and VersionConflictException is used instead (with an improved error message)
- Update will reject version parameters if the upsert option is used (it doesn't compute anyway).
- Translog.Create is also removed infavor of Translog.Index (that's OK because their binary format was the same, so we can just read Translog.Index of the translog file)
Closes#13955
It is rarely used and was not consistently handled by different distributions anyway.
This commit also adds a test for specifying CONF_DIR when installing plugins and
starting elasticsearch.
relates to #12712 and #12954closes#5329closes#13715
Mappings and many modules are not needed here compared to AbstractQueryTestCase, as we will never call toQuery in this other test. Parsing is independent from indices and types as well.
For minScore and terminateAfter we can just rely on defaults set to SearchSourceBuilder. Also simplified toString representation of CountRequestBuilder
Plugin cli tools configures logging with whatever is in the logging.yml.
If a file appender is configured for any of the logs this will cause creation
of an empty log file. If a plugin was for example installed as root it will
create empty logs at es.home/logs.
This is problematic when for example plugins are installed as root and es is run
as service. Logs will then be created in /usr/share/elasticsearch/logs
and can later not be removed by for example dpkg -r or -purge.
To avoid this, configure the logger to use an appender that writes to the same
output that plugin cli tool does. This allows other components that are called
from Plugin cli tool to write to the same terminal that plugin cli tool writes to
by using the logging mechanism already in place.
The logging conf is not read at all pb plugin cli tool.
As a side effect, the loging level for components that are called
from the plugin command such as the jar hell check can now be configured
with -Des.logger.level which makes it easier to debug the jar hell check.
The field is optional everywhere else but in the serialization methods, which causes problems. Also expanded tests so that they can catch this type of problem.
Closes#13963
These abstractions don't really do anything nor can they be extended.
We can just fold them into IndexModule for now. There are more but they
are tricky due to some test dependencies which I need to resolve first.
Types are still optional, but if you do provide them, they can't be null. Split the existing constructor that accepted nnull into two, one that accepts no arguments, and another one that accepts the types argument, which must be not null.
Also trimmed down different ways of setting ids, some were misleading as they would always add the ids to the existing ones and not set them, the add prefix makes that clear. Left `addIds` method that accepts a varargs argument. Added check for ids not be null.
I'm not 100% sure we should remove it as part of the pull request to drop
ImmutableMap. It might be more prudent to change its return type to map
and its implementation to an unmodifiable copy of the map being built
and then remove all consumers after banning ImmutableMap.
Today we use a hirachical injector on the shard level for each shard
created. This commit removes the shard level injetor and replaces
it with good old constructor calls. This also removes all shard level plugin
facilities such that plugins can only have node or index level modules.
For plugins that need to track shard lifecycles they should use the relevant
callback from the lifecycle we already provide.
This commit removes and now forbids all uses of
com.google.common.hash.HashCode, com.google.common.hash.HashFunction,
and com.google.common.hash.Hashing across the codebase. This is one of
the few remaining steps in the eventual removal of Guava as a
dependency.
Relates #13224
The fix in #13848 has an off by one issue where the first byte of the checksum
was never written. Unfortunately most tests shadowed the problem and the first
byte of the checksum seems to be very likely a 0 which causes only very rare
failures.
Relates to #13896
Relates to #13848
Right now, we allow allocation if there is only a single node in the
cluster. it would be nice to fail open when there is only one data node
(instead of only one node total).
closes#9391
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".
Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".
Closes#13828
Today we are relying on calling Store.verify on the closed stream to validate the
checksum. This is still necessary to catch file truncation but for an actually corrupted
file or checksum we can fail early and check the checksum against the actual metadata
once it's been fully written to the VerifyingIndexOutput.
This commit removes and now forbids all uses of
com.google.common.collect.ImmutableCollection across the codebase. This
is one of the final steps in the eventual removal of Guava as a
dependency.
Relates #13224
This commit removes and now forbids all uses of
com.google.common.io.Resources across the codebase. This is one of the
few remaining steps in the eventual removal of Guava as a dependency.
Relates #13224
Mostly favoring unmodifiableMap and making sure to only wrap maps that aren't
otherwise returned and so cannot be copied.
MapMaker#immutableMap has to go next.
We're concerned that unmodifiableMap uses significantly more memory than
ImmutableMap did - especially in cluster state - so we ban it there outright
and move to ImmutableOpenMap.
Removes ClusterState$Builder#routingTable(RoutingTable$Builder) because that
method had the side effect of building the routing table which can only be
done once per RoutingTable$Builder now that it uses ImmutableOpenMap.
Closes#13880
Squashed commit of the following:
commit 316a328e5032e580ba840db993d907631334aac0
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:57:47 2015 -0400
windows is terrible
commit 0406b560c58bf833f8d77af9c7cf3386771dd9c5
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:43:09 2015 -0400
Nuke ES_CLASSPATH appending
Out of box, ES expects its stuff to be in particular places. We should not be appending to ES_CLASSPATH, allowing users to specify stuff there, like we do in elasticsearch.bin.sh
If the user sets it, its not going to work out of box.
Closes#13812
commit 415d8972df28eddec322bb6d70100a1993fa95f6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Sep 30 16:26:35 2015 -0400
Fail hard on empty classpath elements.
This can happen easily, if somehow old 1.x shellscripts survive and try to launch 2.x code.
I have the feeling this happens maybe because of packaging upgrades or something.
Either way: we can just fail hard and clear in this situation, rather than the current situation
where CWD might be /, and we might traverse the entire filesystem until we hit an error...
Relates to #13864
When the plugin manager does not find in `plugin-descriptor.properties` the exact same elasticsearch version it was built on
as the current elasticsearch version, it fails with a message like:
```
ERROR: Elasticsearch version [2.0.0-beta1] is too old for plugin [elasticsearch-mapper-attachments]
```
Actually, the message should be:
```
Plugin [elasticsearch-mapper-attachments] is incompatible with Elasticsearch [2.0.0.beta2]. Was designed for version [2.0.0.beta1].
```
The opposite is true. If you try to install a version of a plugin which was built with a newer version of elasticsearch, it will fail the same way:
```
Plugin [elasticsearch-mapper-attachments] is incompatible with Elasticsearch [2.0.0.beta1]. Was designed for version [2.0.0.beta2].
```
This commit adds supports for expiration after writes to Cache. This
enables entries to expire after they were initially placed in the cache
without prolonging their life on retrieval. Replacements are considered
new writes.
This commit removes and now forbids all uses of
com.google.common.cache.Cache, com.google.common.cache.CacheBuilder,
com.google.common.cache.RemovalListener,
com.google.common.cache.RemovalNotification,
com.google.common.cache.Weigher across the codebase. This is a major
step in the eventual removal of Guava as a dependency.
Relates #13224
We try to get the index shard instance again from the index service on a different
threads while that shard might have already been closed or removed which can cause a NPE
instead of another expected expecption.
This commit shuffels and rewrites some code in RecoverySourceHandler to make it
simpler and more unittestable. This commit doesn't change all parts of this class
neither is it fully tested yet. It's an important part of the infrastrucutre so I started
to make it better tested but I don't want to change everything in one go since it makes
review simpler and more detailed. Future commits will continue cleaning up the class and
add more tests.
If a document is indexed into ES with no id, ES will generate one for it. We used to have an optimization for this case where the engine will not try to resolve the ids of these request in the existing index but immediately try to index them. This optimization has proven to be the source of brittle bugs (solved!) and we disabled it in 1.5, preparing for it to be removed if no performance degradation was found. Since we haven't seen any such degradation we can remove it.
Along with the removal of the optmization, we can remove the autogenerate id flag on indexing requests and the can have duplicate flag. The only downside of the removal of the canHaveDuplicate flag is that we can't make sure any more that when we retry an autogenerated id create operation we will ignore any document already exists exception (See #9125 for background and discussion). To work around this, we don't set the operation to CREATE any more when we generate an id, so the resulting request will never fail when it finds an existing doc but do return a version of 2. I think that's acceptable.
Closes#13857
We have two unneded heavy dependencies on IndexShard that are unneeded and only cause
trouble if you try to mock index shard. This commit removes IndexSettingsService as well as
ClusterSerivce from IndexShard to simplify future mocking and construction.
While refactoring has_child and has_parent query we lost an important detail around types. The types that the inner query gets executed against shouldn't be the main types of the search request but the parent or child type set to the parent query. We used to use QueryParseContext#setTypesWithPrevious as part of XContentStructure class which has been deleted, without taking care though of setting the types and restoring them as part of the innerQuery#toQuery call.
Meanwhile also we make sure that the original context types are restored in PercolatorQueriesRegistry
Closes#13863
Closes#13854
Squashed commit of the following:
commit 42c1166efc55adda0d13fed77de583c0973e44b3
Author: Robert Muir <rmuir@apache.org>
Date: Tue Sep 29 11:59:43 2015 -0400
Add paranoia
Groovy holds on to a classloader, so check it before compilation too.
I have not reviewed yet what Rhino is doing, but just be safe.
commit b58668a81428e964dd5ffa712872c0a34897fc91
Author: Robert Muir <rmuir@apache.org>
Date: Tue Sep 29 11:46:06 2015 -0400
Add SpecialPermission to guard exceptions to security policy.
In some cases (e.g. buggy cloud libraries, scripting engines), we must
grant dangerous permissions to contained cases. Those AccessController blocks
are dangerous, since they truncate the stack, and can allow privilege escalation.
This PR adds a simple permission to check before each one, so that unprivileged code
like groovy scripts, can't do anything they shouldn't be allowed to do otherwise.
StoreRecoveryService used to be a pretty heavy class with lots of dependencies.
This class was basically not testable in isolation and had an async API with a listener.
This commit refactors this class to be a simple utility classs with a sync API hidden behind
the IndexShard interface. It includes single node tests and moves all the async properities to
the caller side.
Note, this change also removes the mapping update on master from the store recovery code since
it's not needed anymore in 3.0 because all stores have been subject to sync mapping updates such
that the master already has all the mappings for documents that made it into the transaction log.
Closes#13766
Now that groovy is factored out, we contain this dangerous stuff there.
TODO: look into those test hacks inspecting class protection domains, maybe we can
clean that one up too.
TODO: generalize the GroovyCodeSourcePermission to something all script engines check,
before entering accesscontrollerblocks. this way e.g. groovy script cannot coerce
python engine into creating something with more privs if it gets ahold of it... we
should probably protect the aws/gce hacks in the same way.
Previous to this change if to equal geohash cell query builders were created and then toQuery was called on one, they would no longer be equal.
This change also adds a test to AbstractQueryTestCase to make sure calling toQuery on any query builder does not affect the query builder's equality
* Add OS X support via "seatbelt" mechanism. This gives consistency across dev and prod, since many devs use OS X.
* block execveat system call: it may be new, but we should not allow it.
When a shard becomes in active we trigger a sync flush in order to speed up future recoveries. The sync flush causes a new translog generation to be made, which in turn confuses the IndexingMemoryController making it think that the shard is active. If no documents comes along in the next 5m, the shard is made inactive again , triggering a sync flush and so forth.
To avoid this, the IndexingMemoryController is changed to ignore empty translogs when checking if a shard became active. This comes with the price of potentially missing indexing operations which are followed by a flush. This is acceptable as if no more index operation come in, it's OK to leave the shard in active.
A new unit test is introduced and comparable integration tests are removed.
Closes#13802
Instead just log the same thing we print to the startup console for that case (magic logic),
it sucks to do this, but guice exceptions are too much.
All other non-guice exceptions will still be fully logged.
Closes#13782
This commit adds explicit ids for managing ElasticsearchException
serialization. By adding explicit ids and unit tests for them, the ids
are less brittle and breakage can be more clearly detected.
Today we use reflection where it's not needed anymore since java8 can
pass ctors around. This commit replaces runtime checks with compile time
checks which is always preferrable.
These classes are really duplicates and are just here for historical reasons.
We don't need these anymore since the same classes exist in lucene today.
This also removes the guice injection for DeletionPolicy and make them shard private.