Commit Graph

21 Commits

Author SHA1 Message Date
Simon Willnauer d2493ea48a [CORE] Support parsing lucene minor version strings
We parse the version that is shipped with the Lucene segments in order
to find the version of lucene that wrote a particular segment. Yet, some lucene
version ie:
 * 4.3.1 (Elasticsearch 0.90.2)
 * 4.5.1 (Elasticsearch 0.90.7)
 * 3.6.1 (pre Elasticsearch 0.90.0)

wrote illegal strings containing the minor version which causes IAE exceptions
being thrown from lucenes parsing method.

Closes #7055
2014-07-28 13:02:00 +02:00
Boaz Leskes b2b443130f Fix forbidden API syntax error 2014-07-01 19:49:57 +02:00
Boaz Leskes 72d2ac1328 Better support for partial buffer reads/writes in translog infrastructure
Some IO api can return after writing & reading only a part of the requested data. On these rare occasions, we should call the methods again to read/write the rest of the data. This has cause rare translog corruption while writing huge documents on Windows.

Noteful parts of the commit:
- A new Channels class with utility methods for reading and writing to channels
- Writing or reading to channels is added to the forbidden API list
- Added locking to SimpleFsTranslogFile
- Removed FileChannelInputStream which was not used

Closes #6441 , #6576
2014-07-01 19:11:36 +02:00
Robert Muir a3d5381392 Disable explicit GC by default
We don't rely upon GC to cleanup mappedbytebuffers, we unmap them
explicitly on close in lucene. But the JDK has crazy loops with
explicit GCs in exceptional cases to try to force unmapping.

In general we don't want any of our code or library code calling
this method: so its banned in forbidden-apis as well.
2014-06-27 14:09:44 +02:00
Robert Muir b55ad98d73 Upgrade to Lucene 4.9 (closes #6623) 2014-06-26 08:18:59 -04:00
Simon Willnauer 797a9b07ef FileSystem: Use XNativeFSLockFactory instead of the buggy Lucene 4.8.1 version
There is a pretty nasty bug in the lock factory we use that can cause
nodes to use the same data dir wiping each others data. Luckily this is
unlikely to happen if the nodes are running in different JVM which they
do unless they are embedded.

See LUCENE-5738

Closes #6424
2014-06-06 11:51:47 +02:00
Martijn van Groningen e2a2f13f17 Added FilteredQuery to the list of forbidden apis 2014-05-08 09:54:10 +02:00
Shay Banon 23f200bc0e Use non analyzed token stream optimization everywhere
In the string type, we have an optimization to reuse the StringTokenStream on a thread local when a non analyzed field is used (instead of creating it each time). We should use this across the board on all places where we create a field with a String.
Also, move to a specific XStringField, that we can reuse StringTokenStream instead of copying it.
closes #6001
2014-04-30 17:18:15 -04:00
Robert Muir 8e0a479316 Upgrade to Lucene 4.8
Closes #5932
2014-04-28 06:45:50 -04:00
Robert Muir 8568c18e6f Change default numeric precision_step
Change the default numeric precision_step to 16 for 64-bit types,
8 for 32-bit and 16-bit types. Disable precision_step for the 8-bit
byte type.

Closes #5905
2014-04-23 09:01:25 -04:00
Simon Willnauer 49d84cb47f [JAVA7 Upgrade] Move to Long.compare 2014-03-27 15:48:12 +01:00
Adrien Grand b5b82626e7 Forbid Math.abs(int/long).
We have had a couple of bugs because of the use of these methods without paying
attention that it might return a negative value when provided with MIN_VALUE.
There is one common and legitimate usage of this method in order to perform
a modulo operation which would always return a positive number. This use-case
has been extracted to MathUtils.mod.

Close #5562
2014-03-27 14:50:43 +01:00
Simon Willnauer 2398bb4f1c Close Directory / Store once all resources have been released
Currently we close the store and therefor the underlying directory
when the engine / shard is closed ie. during relocation etc. We also
just close it while there are still searches going on and/or we are
recovering from it. The recoveries might fail which is ok but searches
etc. will be working like pending fetch phases.

The contract of the Directory doesn't prevent to read from a stream
that was already opened before the Directory was closed but from a
system boundary perspective and from lifecycles that we test it seems
to be the right thing to do to wait until all resources are released.

Additionally it will also help to make sure everything is closed
properly before directories are closed itself.

Note: this commit adds Object#wait & Object@#notify/All to forbidden APIs

Closes #5432
2014-03-21 15:02:38 +01:00
Simon Willnauer 821173b5cf Enforce query instance checking before it wrapper as a filter
We have the default QueryWrapperFilter as well as our custom one while
our wrapper is explicitly marked as no_cache such that it will never
be included in a cache. This was not consistenly used and caused several
problems during tests where p/c related queries were used as filters
and ended up in the cache. This commit adds the QueryWrapperFilter
ctor to the forbidden APIs to enforce the query instance checks.
2014-03-14 20:18:01 +01:00
Shay Banon 992747a159 Force merges to not happen when indexing a doc / flush
Today, even though our merge policy doesn't return new merge specs on SEGMENT_FLUSH, merge on the scheduler is still called on flush time, and can cause merges to stall indexing during merges. Both for the concurrent merge scheduler (the default) and the serial merge scheduler. This behavior become worse when throttling kicks in (today at 20mb per sec).

In order to solve it (outside of Lucene for now), we wrap the merge scheduler with an EnableMergeScheduler, where, on the thread level, using a thread local, the call to merge can be enabled/disabled.

A Merges helper class is added where all explicit merges operations should go through. If the scheduler is the enabled one, it will enable merges before calling the relevant explicit method call. In order to make sure Merges is the only class that calls the explicit merge calls, the IW variant of them is added to the forbidden APIs list.

closes #5319
2014-03-05 12:26:26 +00:00
Adrien Grand fa094a46fd Add IndexReader reference counting methods to forbidden APIs. 2014-02-06 21:00:56 +01:00
Simon Willnauer 9cf8251a0d Add RamUsageEstimator#sizeOf(Object) to forbidden APIs
This method can be a performance trap since it traverse the
entire object tree that is referenced by the provided object.
See LUCENE-5373
2014-01-31 21:43:20 +01:00
Adrien Grand 420a3ed691 Forbid usage of StringReader in favor of FastStringReader.
StringReader is synchronized although input streams should always be consumed
by a single thread at a time. FastStringReader on the other hand is completely
thread unsafe.

Closes #3411
2013-07-31 09:34:18 +02:00
Shay Banon 8ac77b6119 cleanup signatures file 2013-07-26 18:32:51 +02:00
Adrien Grand c20d44a1ff Forbid usage of Character.codePoint(At|Before) and Collections.sort.
Character.codePointAt and codePointBefore have two versions: one which only
accepts an offset, and one which accepts an offset and a limit. The former can
be dangerous when working with buffers of characters because if the offset
is the last char of the buffer, a char outside the buffer might be used to
compute the code point, so one should always use the version which accepts a
limit.

Collections.sort is wasteful on random-access lists: it dumps data into an
array, sorts the list and then adds elements back to the list. However, the
sorting can easily be performed in-place by using Lucene's
CollectionUtil.(merge|quick|tim)Sort.
2013-06-13 10:14:35 +02:00
Simon Willnauer 31f0aca65d Integrate forbiddenAPI checks into Maven build.
This commit integrates the forbiddenAPI checks that checks
Java byte code against a list of "forbidden" API signatures.
The commit also contains the fixes of the current source code
that didn't pass the default API checks.

See https://code.google.com/p/forbidden-apis/ for details.

Closes #3059
2013-05-19 23:25:44 +02:00