Commit Graph

10619 Commits

Author SHA1 Message Date
Britta Weber 0a07ce8916 core: disable auto gen id optimization
This pr removes the optimization for auto generated ids.
Previously, when ids were auto generated by elasticsearch then there was no
check to see if a document with same id already existed and instead the new
document was only appended. However, due to lucene improvements this
optimization does not add much value. In addition, under rare circumstances it might
cause duplicate documents:

When an indexing request is retried (due to connect lost, node closed etc),
then a flag 'canHaveDuplicates' is set to true for the indexing request
that is send a second time. This was to make sure that even
when an indexing request for a document with autogenerated id comes in
we do not have to update unless this flag is set and instead only append.

However, it might happen that for a retry or for the replication the
indexing request that has the canHaveDuplicates set to true (the retried request) arrives
at the destination before the original request that does have it set false.
In this case both request add a document and we have a duplicated a document.
This commit adds a workaround: remove the optimization for auto
generated ids and always update the document.
The asumtion is that this will not slow down indexing more than 10 percent,
see: http://benchmarks.elasticsearch.org/

closes #8788
closes #9468
2015-01-29 16:26:04 +01:00
Oliver e412dab63a Docs: Fix sample query
Closes #9472
2015-01-29 15:56:24 +01:00
Simon Willnauer 15a766084d [CACHE] Use correct number of bytes in query cache accounting
today we use the length of the BytesReference which is misleading since
the reference is paged such that the length != ramBytesUsed. This can lead
to a way higher memory consuption than expected if query results are tiny
since each query result requires at least 16kb. Yet, we should rethink this
strategy for query results that are very small ie. less than 20% of the ramBytesUsed
but this commit first tries to make the acocunting correct.
2015-01-29 10:59:36 +01:00
Simon Willnauer 4917121de2 Remove Unused code and remove unnecessary abstraction
HashedBytesArray is not used anymore and Releable makes only sense on
Paged implementation such that the marker interface is unneeded.
2015-01-29 09:51:14 +01:00
Lee Hinman 86e52c30a1 Make `script.groovy.sandbox.method_blacklist_patch` truly append-only
Additionally, this setting can be specified in elasticsearch.yml if
desired, to pre-populate the list of methods to be added to the default
blacklist.

When making a change to this setting dynamically, the entire blacklist
is logged as well.
2015-01-28 17:09:27 -07:00
Ryan Ernst afcedb94ed Mappings: Remove `index_analyzer` setting to simplify analyzer logic
The `analyzer` setting is now the base setting, and `search_analyzer`
is simply an override of the search time analyzer.  When setting
`search_analyzer`, `analyzer` must be set.

closes #9371
2015-01-28 13:43:15 -08:00
Lee Hinman cc461a837f Avoid NullPointerException if optional Groovy jar is removed 2015-01-28 13:49:50 -07:00
Lee Hinman c610524392 Make groovy sandbox method blacklist dynamically additive
Using the `script.groovy.sandbox.method_blacklist_patch` setting, the
blacklist can be dynamically *added* to by specifying a comma-separated
list of methods (for example, "toString,size" would add .toString and
.size to the blacklist).

When the `script.groovy.sandbox.method_blacklist_patch` setting is
changed, the script cache is cleared to force new scripts to be
recompiled. Additionally the on-disk cache is cleared so that scripts in
the `config/scripts` directory are re-compiled as well.

This also fixes an issue where script engines were injected more than
once, which can cause multiple instances of the script engine per node.
2015-01-28 12:26:09 -07:00
Zachary Tong a4eb1d5505 Aggregations: Add standard deviation bounds to extended_stats
Extended_stats now displays the upper and lower bounds on standard deviations (e.g. avg +/- std).
Default is to show 2 std above/below, but can be changed using the `sigma` parameter.
Accepts non-negative doubles

Closes #9356
2015-01-28 11:47:20 -05:00
gmarz 3e4fc2659d Nodes Stats: Fix open file descriptors count on Windows
Closes #1563
2015-01-28 10:30:02 -05:00
J Charitopoulos b359520849 Docs: Update snapshots.asciidoc
minor syntax

Closes #9457
2015-01-28 15:54:13 +01:00
Nicholas Knize 9622f78fe6 Revert "[GEO] Update GeoPolygonFilter to handle ambiguous polygons"
This reverts commit 06667c6aa8 which introduces an undesireable dependency on JTS.
2015-01-28 08:03:26 -06:00
Clinton Gormley 8978aa5465 Docs: Improved the template query docs
Added the `file` and `id` parameters.

Closes #9458
2015-01-28 14:19:59 +01:00
Colin Goodheart-Smithe 29c24d75e7 Aggregations: Unify histogram implementations
This change makes InternalHistogram the only InternalAggregation used by the Histogram Aggregator. There is still a separate Bucket implementation and Factory implementation. All buckets are created through the factory passed into the InternalHistogram meaning and the correct factory implementation is serialised as part of the aggregation to make sure the correct bucket types are always generate.

This is needed by the Transformers (namely the derivative transformer) to allow it to generate buckets of the right type without having to know what the underlying bucket implementation is.
2015-01-28 10:45:28 +00:00
Boaz Leskes 1695f76f68 Test: testOldIndexes should disable merging
It verifies some segments need to be upgraded, but if they are merged away, there are upgraded implicitly
2015-01-28 11:34:58 +01:00
Boaz Leskes 22a576d5ba Recovery: flush immediately after a remote recovery finishes (unless there are ongoing ones)
To properly replicate, we currently stop flushing during recovery so we can repay the translog once copying files are done. Once recovery is done, the translog will be flushed by a background thread that, by default, kicks in every 5s. In case of a recovery failure and a quick re-assignment of a new shard copy, we may fail to flush before starting a new recovery, causing it to deal with potentially even longer translog. This commit makes sure we flush immediately when the ongoing recovery count goes to 0.

I also added a simple recovery benchmark.

Closes #9439
2015-01-28 09:14:23 +01:00
Igor Motov 13ef7d73b9 Snapshot/Restore: better handling of index deletion during snapshot
If an index is deleted during initial state of the snapshot operation, the entire snapshot can fail with NPE. This commit improves handling of this situation and allows snapshot to continue if partial snapshots are allowed.

Closes #9024
2015-01-27 21:06:29 -05:00
Boaz Leskes 3512860956 Test: always use replicas in testClusterInfoServiceInformationClearOnError
It assume the local node always has a shard
2015-01-28 00:23:03 +01:00
Nicholas Knize 06667c6aa8 [GEO] Update GeoPolygonFilter to handle ambiguous polygons
PR #8672 addresses ambiguous polygons - those that either cross the dateline or span the map - by complying with the OGC standard right-hand rule. Since ```GeoPolygonFilter``` is self contained logic, the fix in #8672 did not address the issue for the ```GeoPolygonFilter```. This was identified in issue #5968

This fixes the ambiguous polygon issue in ```GeoPolygonFilter``` by moving the dateline crossing code from ```ShapeBuilder``` to ```GeoUtils``` and reusing the logic inside the ```pointInPolygon``` method.  Unit tests are added to ensure support for coordinates specified in either standard lat/lon or great-circle coordinate systems.

closes #5968
closes #9304
2015-01-27 15:45:05 -06:00
Boaz Leskes 9ac6d78308 Internal: ClusterInfoService should wipe local cache upon unknown exceptions
The InternalClusterInfoService reaches out to the nodes to get information about their disk usage and shard store size. Upon a node level error we currently remove the node info from the local cache. We should also clear the cache when we run into an error on the action level (excluding any info from all nodes).

 This also adds settings for the timeout used when waiting for nodes.

Closes #9449
2015-01-27 22:38:08 +01:00
Lee Hinman 2f6527f491 [DOCS] Update documentation for `max_token_length`
In 1.4 the behavior is different due to
https://issues.apache.org/jira/browse/LUCENE-5897
2015-01-27 13:52:14 -07:00
simaov 1ca8404674 #9444 join lines
Fixes #9445
2015-01-27 18:14:56 +00:00
simaov f3e1a66133 #9444 throw StrictDynamicMappingException exception if dynamic is 'strict' and undeclared field value is NULL, test for this
Fixes #9445
2015-01-27 18:14:56 +00:00
Lee Hinman 39c064ce8b [TEST] remove AwaitsFix from DeleteByQuery test 2015-01-27 10:15:44 -07:00
Ryan Ernst cff0ec3972 Mappings: Remove type level default analyzers
closes #8874
2015-01-27 08:30:51 -08:00
Colin Goodheart-Smithe 6f894b1d2c [TEST] Fix HistogramTests
Fixed histogram tests for value scripts as it was picking the wrong buckets form the bucket list following the removal of the getBucketByKey method
2015-01-27 12:10:38 +00:00
Martijn van Groningen 7e6e9dbb96 Aggs: nested agg needs to reset root doc between segments.
Closes #9437
Closes #9436
2015-01-27 12:53:47 +01:00
javanna 93bf737f34 Internal: fix shard state tranport action names
When we renamed all of the transport actions in #7105, shard started and failed were flipped around by mistake. This commit fixes their naming.

Closes #9440
2015-01-27 12:38:16 +01:00
Colin Goodheart-Smithe 285ef0f06d Aggregations: Clean up response API for Aggregations
This change makes the response API object for Histogram Aggregations the same for all types of Histogram, and does the same for all types of Ranges.
The change removes getBucketByKey() from all aggregations except filters and terms. It also reduces the methods on the Bucket class to just getKey() and getKeyAsString().
The getKey() method returns Object and the actual Type is returns will be appropriate for the type of aggregation being run. e.g. date_histogram will return a DateTime for this method and Histogram will return a Number.
2015-01-27 10:53:44 +00:00
Sourav Mitra 78c52d559d Minor hygiene, Removed Redundant inheritance
Close #9427
2015-01-27 11:02:43 +01:00
Christian Verkerk 5b31189498 Docs: Update cluster.asciidoc
Clarify the preferencing.

Closes #9434
2015-01-27 10:48:40 +01:00
Lee Hinman 0143d835d4 [TEST] Add `ensureGreen` to indices created in TopHitsTests 2015-01-26 18:45:04 -07:00
Lee Hinman 8fc58dc00a [TEST] Add `ensureGreen` where needed in NestedTests 2015-01-26 18:26:04 -07:00
Lee Hinman 92b218ba51 [TEST] Mute DeleteByQueryTests.testDeleteAllOneIndex
See:
https://github.com/elasticsearch/elasticsearch/issues/9421
2015-01-26 18:01:08 -07:00
Aske Hansen 084dc7a656 Docs: added searchkick
Closes #9416
2015-01-26 21:58:36 +01:00
Martijn van Groningen a645994086 Aggs: fix handling of the same child doc id being processed multiple times in the `reverse_nested` aggregation.
Closes #9263
Closes #9345
2015-01-26 18:36:35 +01:00
Ryan Ernst 385c43c141 Mappings: Remove _analyzer
closes #9279
2015-01-26 09:14:17 -08:00
Lee Hinman 537769c225 Relax restrictions on filesystem size reporting
Apparently some filesystems such as ZFS and occasionally NTFS can report
filesystem usages that are negative, or above the maximum total size of
the filesystem. This relaxes the constraints on `DiskUsage` so that an
exception is not thrown.

If 0 is passed as the totalBytes, `.getFreeDiskAsPercentage()` will
always return 100.0% free (to ensure the disk threshold decider fails
open)

Fixes #9249
Relates to #9260
2015-01-26 09:46:21 -07:00
Martijn van Groningen 7ca2ef9b93 Nested aggregator: Fix handling of multiple buckets being emitted for the same parent doc id.
This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small.

Closes #9317
Closes #9346
2015-01-26 17:41:25 +01:00
Britta Weber f8294352f7 [TEST] mute test for now becasue we have an issue for it 2015-01-26 17:30:43 +01:00
Martijn van Groningen f9c0e0d4c7 aggs: The `nested` aggregator's parent filter is n't resolved properly in the case the nested agg gets created on the fly for buckets that are constructed during query execution.
The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.

Closes #9280
Closes #9335
2015-01-26 12:17:52 +01:00
Britta Weber c3f1982f21 [TEST] check that primaries succeeded
We want to check if at least the primaries succeeded if we do not
wait for green and not if all succeeded if we wait for green.
That was a misconception in c617af37e8
2015-01-26 11:14:38 +01:00
javanna aa6bf5fd1d Snapshot status api: make sure headers and request context are handed over to inner nodes request
Closes #9409
2015-01-26 10:33:57 +01:00
Boaz Leskes 974fafb2da Test: add logging to SearchWithRandomExceptionsTests 2015-01-26 09:59:01 +01:00
Robert Muir be3e60efc8 Upgrade to lucene r1654549 snapshot.
Closes #9402.

Squashed commit of the following:

commit 85c71b6478441a73738c81f02257193f9837f3ba
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Jan 24 11:24:36 2015 -0500

    upgrade to lucene r1654549 snapshot
2015-01-25 15:01:45 -05:00
Michael McCandless 50e9108305 Core: do not throttle recovery indexing operations when replaying transaction log
Closes #9396

Closes #9394
2015-01-23 17:41:37 -05:00
Ryan Ernst dfc2c9f3a1 Tests: Tweaking static bwc tests to improve stability 2015-01-23 13:59:42 -08:00
Ryan Ernst 54d01c48cb Tests: Add memory info to static bwc index tests. 2015-01-23 13:20:23 -08:00
Andrew Ochsner f95fa83e5b Closes #9398 2015-01-23 18:06:12 +01:00
Britta Weber c617af37e8 [TESTS] ensureGreen, else reported successful shards will be lower than expected 2015-01-23 17:12:33 +01:00